How to get each individual replacement result from a Regex replacement? - java

question
How to get each individual replacement result from a Regex replacement?
ex
String regexMatchedWord = matcher.group(); allows me to access the current matched result;
But is there something like String regexMatchedSubstitution = matcher.currentMatchedReplacementResult(); allows me to access the current replacement result?
public class Test {
public static void main(String[] args) {
String content_SearchOn = "Sample sentence: snake, snail, snow, spider";
String regexStrSubstitution = "$2$3$1";
String regexStrMatchFor = "(s)(.)(.)";
Matcher matcher = Pattern.compile(regexStrMatchFor).matcher(content_SearchOn);
ArrayList<String> arr_regexMatchedWord = new ArrayList<>();
ArrayList<String> arr_regexMatchedSubstitution = new ArrayList<>();
StringBuilder sb_content_Replaced = new StringBuilder();
while (matcher.find()) {
String regexMatchedWord = matcher.group();
arr_regexMatchedWord.add(regexMatchedWord);
matcher.appendReplacement(sb_content_Replaced, regexStrSubstitution);
String regexMatchedSubstitution = null; // << What should I put here -- to get each replacement result?
arr_regexMatchedSubstitution.add(regexMatchedSubstitution);
}
matcher.appendTail(sb_content_Replaced);
System.out.println(sb_content_Replaced); // Sample enstence: naske, nasil, nosw, pisder
System.out.println(arr_regexMatchedWord); // [sen, sna, sna, sno, spi]
System.out.println(arr_regexMatchedSubstitution); // [ens, nas, nas, nos, pis] // << expect
}
}
comments
if Java is not able to do this, is there any other language able to? (Javascript? Python?)
Update: potential solution (workaround)
(as talked in the comment) A simple possible way might be:
convert those $1 into group(1) programmatically,
but you have to watch out for the escape characters like \ that has special meaning...
Another way might be:
use Reflection to somehow get the local variable result in the source code appendExpandedReplacement(replacement, result); of java.util.regex.Matcher.appendReplacement(StringBuilder, String)
public Matcher appendReplacement(StringBuilder sb, String replacement) {
// If no match, return error
if (first < 0)
throw new IllegalStateException("No match available");
StringBuilder result = new StringBuilder();
appendExpandedReplacement(replacement, result);
// Append the intervening text
sb.append(text, lastAppendPosition, first);
// Append the match substitution
sb.append(result);
lastAppendPosition = last;
modCount++;
return this;
}
Or:
Record the end index before the append & count from that index to get the Appended Replacement after the append.

solution (workaround) Java implementation
#logic::
Record the end index before the append & count from that index to get the Appended Replacement after the append.
#code::
public class Test {
public static void main(String[] args) {
String content_SearchOn = "Sample sentence: snake, snail, snow, spider";
String regexStrSubstitution = "$2$3x$1";
String regexStrMatchFor = "(s)(.)(.).";
Matcher matcher = Pattern.compile(regexStrMatchFor).matcher(content_SearchOn);
ArrayList<String> arr_regexMatchedWord = new ArrayList<>();
ArrayList<String> arr_regexMatchedSubstitution = new ArrayList<>();
StringBuilder sb_content_SearchOn = new StringBuilder(content_SearchOn);
StringBuilder sb_content_Replaced = new StringBuilder();
String content_OriPlusCurrAppendSubsti = null;
StringBuilder sb_CurrAppendSubsti_buffer = null;
int indStart_g0_curr = -1;
int indEnd_g0_curr = -1;
int indStart_g0_prev = -1;
int indEnd_g0_prev = -1;
while (matcher.find()) {
// #>>>#
String regexMatchedWord = matcher.group();
indStart_g0_curr = matcher.start();
indEnd_g0_curr = matcher.end();
arr_regexMatchedWord.add(regexMatchedWord);
// #>>>
// #main[business logic]::
// <strike> length_sb_content_Replaced_prev = sb_content_Replaced.length();
// <strike> String regexMatchedSubstitution = sb_content_Replaced.substring(length_sb_content_Replaced_prev);
// #note: it appends both the `the intervening text` + `the match substitution` ...
//need_check,need_confrim_recall if multi call? // matcher.appendReplacement(new StringBuilder(), regexStrSubstitution); // ok its broken, so cant
//~ matcher.appendReplacement(sb_content_Replaced, regexStrSubstitution);
sb_CurrAppendSubsti_buffer = new StringBuilder();
matcher.appendReplacement(sb_CurrAppendSubsti_buffer, regexStrSubstitution + "_$0");
sb_content_Replaced.append(sb_CurrAppendSubsti_buffer);
// #main;;
// #>>>
// #main[get the individual replacement result]::
//~ String regexMatchedSubstitution = null; // << What should I put here -- to get each replacement result?
if (indEnd_g0_prev == -1) {
content_OriPlusCurrAppendSubsti = "";
} else {
content_OriPlusCurrAppendSubsti = sb_content_SearchOn.substring(0, indEnd_g0_prev);
}
content_OriPlusCurrAppendSubsti += sb_CurrAppendSubsti_buffer;
String regexMatchedSubstitution = content_OriPlusCurrAppendSubsti.substring(indStart_g0_curr);
arr_regexMatchedSubstitution.add(regexMatchedSubstitution);
// #main;;
// #>>>#
indStart_g0_prev = indStart_g0_curr;
indEnd_g0_prev = indEnd_g0_curr;
}
matcher.appendTail(sb_content_Replaced);
//
System.out.println(sb_content_Replaced); // Sample enxs_sentence: naxs_snake, naxs_snail, noxs_snow, pixs_spider
System.out.println(arr_regexMatchedWord); // [sent, snak, snai, snow, spid]
System.out.println(arr_regexMatchedSubstitution); // [enxs_sent, naxs_snak, naxs_snai, noxs_snow, pixs_spid] // << expect
}
}
solution (workaround) Javascript implementation
#logic::
simply brute force with hardcode string delimiter indicator in regex
replaceAll() -- add brackets around the matched replacement during replacement
matchAll() -- search the matched replacement that was enclosed in the brackets
#code (moved from specific example to a general class [here])::
class RegexUtil {
// https://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
// https://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
/**
* #param {String} literal_string
* #returns {String}
*/
static escapeRegex(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}
/**
* #param {String} string
* #returns {String}
*/
static escapeRegexReplacement(string) {
return string.replace(/\$/g, '$$$$');
}
/**
* #param {String} content_SearchOn
* #param {RegExp} regexMatchFor
* #param {String} regexStrSubstitution
* #param {String} regexFlag
* #returns {String[]}
*/
static get_RegexMatchedReplacement(content_SearchOn, regexMatchFor, regexStrSubstitution) {
const arr_regexMatchedSubstitution = [];
let time_now;
let delim_regexMatchedSub_left;
let delim_regexMatchedSub_right;
/** #type {IterableIterator<RegExpMatchArray>} */ let itr;
let i = 0;
do {
i++;
if (i === 50) {
throw new Error('Many loops tried, Unable to brute force with hardcode string indicator in regex. (The chance of this happening is nearly impossible.)');
}
time_now = Date.now();
delim_regexMatchedSub_left = '#drmsL' + time_now + ';';
delim_regexMatchedSub_right = '#drmsR' + time_now + ';';
itr = content_SearchOn.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '|' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'g'));
} while (itr.next().done !== true);
const content_Replaced_WithDelimiter = content_SearchOn.replaceAll(regexMatchFor, RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_left) + regexStrSubstitution + RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_right));
itr = content_Replaced_WithDelimiter.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '(.*?)' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'gs')); // need flag s
for (const matcher_curr of itr) {
arr_regexMatchedSubstitution.push(matcher_curr[1]);
}
return arr_regexMatchedSubstitution;
}
}
#code (moved from specific example [here] to a general class)::
class RegexUtil {
// https://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
// https://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
/**
* #param {String} literal_string
* #returns {String}
*/
static escapeRegex(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}
/**
* #param {String} string
* #returns {String}
*/
static escapeRegexReplacement(string) {
return string.replace(/\$/g, '$$$$');
}
}
//think aga, to use a generic way to escape special meaning delimiter in regex ...
const content_SearchOn = 'Sample sentence: snake, snail, snow, spider';
let regexStrSubstitution = '$2$3x$1';
const regexStrMatchFor = '(s)(.)(.).';
const regexFlag = 'gmd';
regexStrSubstitution += '_$&';
const arr_regexMatchedWord = [];
const arr_regexMatchedSubstitution = [];
let time_now;
let delim_regexMatchedSub_left;
let delim_regexMatchedSub_right;
/** #type {IterableIterator<RegExpMatchArray>} */ let itr;
let i = 0;
do {
i++;
if (i === 50) {
throw new Error('Many loops tried, Unable to brute force with hardcode string indicator in regex. (The chance of this happening is nearly impossible.)');
}
time_now = Date.now();
delim_regexMatchedSub_left = '#drmsL' + time_now + ';';
delim_regexMatchedSub_right = '#drmsR' + time_now + ';';
itr = content_SearchOn.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '|' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'g'));
} while (itr.next().done !== true);
const content_Replaced_WithDelimiter = content_SearchOn.replaceAll(new RegExp(regexStrMatchFor, regexFlag), RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_left) + regexStrSubstitution + RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_right));
itr = content_Replaced_WithDelimiter.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '(.*?)' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'gs')); // need flag s
for (const matcher_curr of itr) {
arr_regexMatchedSubstitution.push(matcher_curr[1]);
}
itr = content_SearchOn.matchAll(new RegExp(regexStrMatchFor, regexFlag));
for (const matcher_curr of itr) {
arr_regexMatchedWord.push(matcher_curr[0]);
}
const content_Replaced = content_SearchOn.replaceAll(new RegExp(regexStrMatchFor, regexFlag), regexStrSubstitution);
console.log(content_Replaced); // Sample enxs_sentence: naxs_snake, naxs_snail, noxs_snow, pixs_spider
console.log(arr_regexMatchedWord); // [sent, snak, snai, snow, spid]
console.log(arr_regexMatchedSubstitution); // [enxs_sent, naxs_snak, naxs_snai, noxs_snow, pixs_spid] // << expect
comment (minor)
The reason to brute force it with hardcode string indicator in regex is that,
Javascript is even worse at:
not providing the appendReplacement()
replacer callback function does not support those $1
Note: The above-mentioned special replacement patterns do not apply for strings returned from the replacer function.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#specifying_a_function_as_the_replacement
replacer callback function does not support those $1
Which makes this following idea useless (could have work) (complex & low performance) ::
for (const matcher_curr of itr_matcher) {
ind_ReplaceOnlyCurrOne++;
let ind_Match = -1;
function replace_OnlyOneWord_c_for_get_regexMatchedSubstitution(...args) {
ind_Match++;
/** #type {String} */ const g0 = args[0];
if (ind_Match === ind_ReplaceOnlyCurrOne) {
// prettier-ignore
let arg_last = args.at(-1); let ind_g0; let content_SearchOn; let groups;
// prettier-ignore
if (typeof arg_last === 'string') { content_SearchOn = arg_last; ind_g0 = args.at(-2); } else { groups = arg_last; content_SearchOn = args.at(-2); ind_g0 = args.at(-3); }
arr_regexMatchedWord.push(g0);
indStart_g0 = ind_g0;
indEnd_g0 = ind_g0 + g0.length;
return replacer_main(args);
} else {
return RegexUtil.escapeRegexReplacement(g0);
}
}
const content_ReplacedOnlyCurrOne__P1_Pm_P2 = content_SearchOn.replaceAll(new RegExp(regexStrMatchFor, regexFlag), replace_OnlyOneWord_c_for_get_regexMatchedSubstitution);
const Pm_P2 = content_ReplacedOnlyCurrOne__P1_Pm_P2.slice(indStart_g0);
const P2 = content_SearchOn.slice(indEnd_g0);
const regexMatchedSubstitution__Pm = Pm_P2.replaceAll(new RegExp(RegexUtil.escapeRegexp(P2)+'$', 'g'), '');
arr_regexMatchedSubstitution.push(regexMatchedSubstitution__Pm);
}

You can use replaceAll(Function<MatchResult, String> replacer) on a Matcher to "intercept" the replacement:
String input = "Sample sentence: snake, snail, snow, spider";
List<String> matches = new ArrayList<>();
String result = Pattern.compile("(s)(.)(.)").matcher(input)
.replaceAll(mr -> {
matches.add(mr.group());
return mr.group(2) + mr.group(3) + mr.group(1);
});
System.out.println(result);
System.out.println(matches);
Output:
Sample enstence: naske, nasil, nosw, pisder
[sen, sna, sna, sno, spi]

Related

Not able to get 4 leading zeros in sha256 hash proof of work- ever (Java)

I'm build a blockchain app.
When I run tests in main, no matter what I do, no matter how much time I give it, when I log different things out, I'm unable to get 4 leading zeroes and so complete a difficulty level of 4. I see the log of the binary hashes and many times they have repeating elements, 1111 for instance, but never 0000 until my time is hit and the difficulty decreases to three. I have no idea why.
I borrowed the hash algorithm from an online source and I checked its output against an online hasher and it checked out.
I know with each level of difficulty it increases exponentially but 2^4 is still only 16 and I see other repeating numbers (1111, 1010, any combination except 0000). Is there any reason why this might be the case?
I wanted to provide an abundance of code rather than a shortage. Logically it makes no sense why randomly if all numbers were equally possible, it woudln't turn up 0000* (e.g. 0000101011at some point). therefore Four zeros must not be possible, but why? I waited 100 seconds mutliple times and saw other numbers repeat themselves. I saw it hit at exactly 4 or 3 or 2 seconds each time on the dot when difficulty went to three. When I start at difficulty 5 (genesis block) it will never solve- I'm sure even if I left it running overnight. So what could be going on?
package privblock.gerald.ryan;
import java.nio.charset.StandardCharsets;
import java.security.NoSuchAlgorithmException;
import java.time.Instant;
import java.util.Arrays;
import java.util.Date; // gets time in ms.
import privblock.gerald.ryan.util.CryptoHash;
/**
*
* #author Gerald Ryan Block Class of blockchain app
*
* Description: The block hash is the result of the timestamp, the
* last_hash, the data, the difficulty and the nonce
*
*/
public class Block {
long timestamp;
String lastHash;
String hash;
String[] data;
int difficulty;
int nonce;
// Millisecond basis
;
static long MILLISECONDS = 1;
static long SECONDS = 1000 * MILLISECONDS;
static long MINE_RATE = 2 * SECONDS;
/**
* A block is a unit of storage for a blockchain that supports a cryptocurrency.
*
* #param timestamp
* #param lastHash
* #param hash
* #param data
* #param difficulty
* #param nonce
*/
public Block(long timestamp, String lastHash, String hash, String[] data, int difficulty, int nonce) {
super();
this.timestamp = timestamp;
this.lastHash = lastHash;
this.hash = hash;
this.data = data;
this.difficulty = difficulty;
this.nonce = nonce;
}
public String toString() {
return "\n-----------BLOCK--------\ntimestamp: " + this.timestamp + "\nlastHash: " + this.lastHash + "\nhash: "
+ this.hash + "\ndifficulty: " + this.getDifficulty() + "\nNonce: " + this.nonce
+ "\n-----------------------\n";
}
/**
* Mine a block based on given last block and data until a block hash is found
* that meets the leading 0's Proof of Work requirement.
*
* #param last_block
* #param data
* #return
* #throws NoSuchAlgorithmException
*/
public static Block mine_block(Block last_block, String[] data) throws NoSuchAlgorithmException {
long timestamp = new Date().getTime();
String last_hash = last_block.getHash();
int difficulty = Block.adjust_difficulty(last_block, timestamp);
int nonce = 0;
String hash = CryptoHash.getSHA256(timestamp, last_block.getHash(), data, difficulty, nonce);
String proof_of_work = CryptoHash.n_len_string('0', difficulty);
// System.out.println("Proof of work " + proof_of_work);
String binary_hash = CryptoHash.hex_to_binary(hash);
// System.out.println("binary hash " + binary_hash);
String binary_hash_work_end = binary_hash.substring(0, difficulty);
// System.out.println("binary_Hash_work_end " + binary_hash_work_end);
System.out.println("Difficulty: " + difficulty);
while (!proof_of_work.equalsIgnoreCase(binary_hash_work_end)) {
// System.out.println("Working");
nonce += 1;
timestamp = new Date().getTime();
difficulty = Block.adjust_difficulty(last_block, timestamp);
hash = CryptoHash.getSHA256(timestamp, last_block.getHash(), data, difficulty, nonce);
proof_of_work = CryptoHash.n_len_string('0', difficulty);
binary_hash = CryptoHash.hex_to_binary(hash);
binary_hash_work_end = binary_hash.substring(0, difficulty);
// System.out.println(binary_hash_work_end);
// System.out.println(binary_hash);
// System.out.println(proof_of_work);
}
System.out.println("Solved at Difficulty: " + difficulty);
// System.out.println("Proof of work requirement " + proof_of_work);
// System.out.println("binary_Hash_work_end " + binary_hash_work_end);
// System.out.println("binary hash " + binary_hash);
System.out.println("BLOCK MINED");
return new Block(timestamp, last_hash, hash, data, difficulty, nonce);
}
/**
* Generate Genesis block
*
* #return
*/
public static Block genesis_block() {
long timestamp = 1;
String last_hash = "genesis_last_hash";
String hash = "genesis_hash";
String[] data = { "buy", "privcoin" };
int difficulty = 4;
int nonce = 0;
return new Block(timestamp, last_hash, hash, data, difficulty, nonce);
}
/**
* Calculate the adjusted difficulty according to the MINE_RATE. Increase the
* difficulty for quickly mined blocks. Decrease the difficulty for slowly mined
* blocks.
*
* #param last_block
* #param new_timestamp
*/
public static int adjust_difficulty(Block last_block, long new_timestamp) {
long time_diff = new_timestamp - last_block.getTimestamp();
// System.out.println(time_diff);
if (time_diff < MINE_RATE) {
// System.out.println("Increasing difficulty");
return last_block.getDifficulty() + 1;
} else if (last_block.getDifficulty() - 1 > 0) {
// System.out.println("Decreasing difficulty");
return last_block.getDifficulty() - 1;
} else {
return 1;
}
}
/**
* Validate block by enforcing following rules: - Block must have the proper
* last_hash reference - Block must meet the proof of work requirements -
* difficulty must only adjust by one - block hash must be a valid combination
* of block fields
*
* #param last_block
* #param block
* #return
* #throws NoSuchAlgorithmException
*/
public static boolean is_valid_block(Block last_block, Block block) throws NoSuchAlgorithmException {
String binary_hash = CryptoHash.hex_to_binary(block.getHash());
char[] pow_array = CryptoHash.n_len_array('0', block.getDifficulty());
char[] binary_char_array = CryptoHash.string_to_charray(binary_hash);
if (!block.getLastHash().equalsIgnoreCase(last_block.getHash())) {
System.out.println("The last hash must be correct");
return false;
// Throw exception the last hash must be correct
}
if (!Arrays.equals(pow_array, Arrays.copyOfRange(binary_char_array, 0, block.getDifficulty()))) {
System.out.println("Proof of work requirement not met");
return false;
// throw exception - proof of work requirement not met
}
if (Math.abs(last_block.difficulty - block.difficulty) > 1) {
System.out.println("Block difficulty must adjust by one");
return false;
// throw exception: The block difficulty must only adjust by 1
}
String reconstructed_hash = CryptoHash.getSHA256(block.getTimestamp(), block.getLastHash(), block.getData(),
block.getDifficulty(), block.getNonce());
if (!block.getHash().equalsIgnoreCase(reconstructed_hash)) {
System.out.println("The block hash must be correct");
System.out.println(block.getHash());
System.out.println(reconstructed_hash);
return false;
// throw exception: the block hash must be correct
}
System.out.println("You have mined a valid block");
return true;
}
public int getDifficulty() {
return difficulty;
}
public long getTimestamp() {
return timestamp;
}
public String getHash() {
return hash;
}
public String getLastHash() {
return lastHash;
}
public String[] getData() {
return data;
}
public int getNonce() {
return nonce;
}
public static void main(String[] args) throws NoSuchAlgorithmException {
// String md = CryptoHash.getSHA256("foobar");
Block genesis = genesis_block();
System.out.println(genesis.toString());
// Block bad_block = Block.mine_block(genesis, new String[] { "watch", "AOT" });
// bad_block.lastHash = "evil data";
// System.out.println(bad_block.toString());
Block good_block = mine_block(genesis, new String[] { "foo", "bar" });
System.out.println(good_block.toString());
// System.out.println(mine_block(new_block, new String[] { "crypto", "is", "fun" }).toString());
// System.out.println(Block.is_valid_block(genesis, bad_block)); // returns false as expected
System.out.println(Block.is_valid_block(genesis, good_block));
System.out.println(CryptoHash.hex_to_binary(good_block.getHash()));
Block good_block2 = mine_block(good_block, new String[] { "bar", "foo" });
Block good_block3 = mine_block(good_block2, new String[] { "bar", "foo" });
Block good_block4 = mine_block(good_block3, new String[] { "bar", "foo" });
// Block good_block5 = mine_block(good_block4, new String[] {"bar", "foo"});
// Block good_block6 = mine_block(good_block5, new String[] {"bar", "foo"});
}
}
package privblock.gerald.ryan.util;
import java.math.BigInteger;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HashMap;
public class CryptoHash {
static HashMap<Character, String> HEX_TO_BIN_TABLE;
static {
HEX_TO_BIN_TABLE = new HashMap<Character, String>();
HEX_TO_BIN_TABLE.put('0', "0000");
HEX_TO_BIN_TABLE.put('1', "0001");
HEX_TO_BIN_TABLE.put('2', "0010");
HEX_TO_BIN_TABLE.put('3', "0011");
HEX_TO_BIN_TABLE.put('4', "0100");
HEX_TO_BIN_TABLE.put('5', "0101");
HEX_TO_BIN_TABLE.put('6', "0110");
HEX_TO_BIN_TABLE.put('7', "0111");
HEX_TO_BIN_TABLE.put('8', "1000");
HEX_TO_BIN_TABLE.put('9', "1001");
HEX_TO_BIN_TABLE.put('a', "1010");
HEX_TO_BIN_TABLE.put('b', "1011");
HEX_TO_BIN_TABLE.put('c', "1100");
HEX_TO_BIN_TABLE.put('d', "1101");
HEX_TO_BIN_TABLE.put('e', "1110");
HEX_TO_BIN_TABLE.put('f', "1111");
}
public static String getSHA256(String... sarray) throws NoSuchAlgorithmException {
String s = concat(sarray);
// System.out.printf("Hashing \"%s\"\n", s);
MessageDigest md;
md = MessageDigest.getInstance("SHA-256");
byte[] b = md.digest(s.getBytes(StandardCharsets.UTF_8));
BigInteger number = new BigInteger(1, b);
StringBuilder hexString = new StringBuilder(number.toString(16));
while (hexString.length() < 32) {
hexString.insert(0, '0');
}
String mds = hexString.toString();
// System.out.printf("hash is:\n%s\n", mds);
return hexString.toString();
}
public static String getSHA256(long timestamp, String last_hash, String[] data, int difficulty, int nonce)
throws NoSuchAlgorithmException {
String s = "";
s += Long.toString(timestamp);
s += last_hash;
s += concat(data);
s += Integer.toString(difficulty);
s += Integer.toString(nonce);
// System.out.printf("Hashing \"%s\"\n", s);
MessageDigest md;
md = MessageDigest.getInstance("SHA-256");
byte[] b = md.digest(s.getBytes(StandardCharsets.UTF_8));
BigInteger number = new BigInteger(1, b);
StringBuilder hexString = new StringBuilder(number.toString(16));
// System.out.println(hexString);
while (hexString.length() < 32) {
hexString.insert(0, '0');
}
String messageDigestString = hexString.toString();
// System.out.printf("hash is:\n%s\n", messageDigestString);
return hexString.toString();
}
public static char[] n_len_array(char c, int n) {
char[] ch = new char[n];
for (int i = 0; i<n; i++) {
ch[i] = c;
}
return ch;
}
public static String n_len_string(char c, int n) {
String s = "";
for (int i = 0; i<n; i++) {
s += c;
}
return s;
}
public static String concat(String... args) {
String s = "";
for (String $ : args) {
s += $;
}
// System.out.println(s);
return s;
}
public static char[] string_to_charray(String str) {
char[] ch = new char[str.length()];
for (int i = 0; i < str.length(); i++) {
ch[i] = str.charAt(i);
}
return ch;
}
public static String string_to_hex(String arg) {
return String.format("%064x", new BigInteger(1, arg.getBytes(StandardCharsets.UTF_8)));
}
public static String hex_to_binary(String hex_string) {
String binary_string = "";
for (int i = 0; i < hex_string.length(); i++) {
binary_string += HEX_TO_BIN_TABLE.get(hex_string.charAt(i));
}
return binary_string;
}
public static String string_to_binary(String raw_string) {
String hex_string = string_to_hex(raw_string);
String bin_string = hex_to_binary(hex_string);
return bin_string;
}
}
ps here's an example of a log I created. I created other cleaner logs too but this shows what we're working with. The first item represents time in milliseconds. The second represents the first four digits of the hash, which is directly below it, followed by the level of difficulty requirement string (what the second item needs to be, length n = difficulty level). The hash just never leads with four zeros, ever, so my hash function or call to the function must be broken in some way.
6479
1000
1000001010111011100110111010100100111010101001111110010101011101101101110000110100110110110000001010001000000010110001100111100111010100110001001001110111011010011100110000011111110100000100000100000010100001000110000111000101100010001111011000110011111101
0000
6479
0101
0101110111010100101010100000001011100011000001110001011011001101001111101011010011000111101101111111001001001010100110101101100111111011001011100101111000011100010001000000000011000111010000101101001000001010101010111001010000101001110011111101011011011000
0000
6479
1000
1000000001000101001110001110110000110111001101100001011000111010111110001011011010011111111101011001110011001001111011011110110010101010101100011011001001110001100010010101001011100001101011011101010000000100111100011011110100000101100111010100100110011101
0000
6479
I figured out the problem. It is indeed often returning 4 leading zeroes but the code as structured is clipping them off (because it doesn't think they have meaning). I noticed by logging that the length is not always a fixed 64byte/256 bit string. Here's the output:
256
1101111000010000100001110001010001010000001010111001100011010011110010001001010001010010100110111000110010000010001110110100100101000000001111111110011100000001010100000111001000111101010001010100110100000000111000100001000000010010010111011110110011110111
256
011001111101001000011111011001111110010110000011001011111010001011010110010100001011010011010010111101100010010111000010110010110111110001010101100000000101001000111110100111011100001110010010101011011000000101100001101110101101010001110000111111110000
252
0001100101110011101000000011000101011100111101110100111110100101110110011100010110001011000110010011110110011001100111010001100100011001011000001011100011011011011011101000111000011100100011011011011000101010011101000110101011000110011100111010000011000011
256
1100110001001001110001100111100010101100100010110111100111001010011011111111100010100110110000010000101000010111111010010101110001100010101010111111111111001011010111010100001010000010111100100100111000010101011000110000100000100111010001000011000000010000
256
So that's solved, or at least I understand the problem. It's amazing what sleep will do.

Email Validation in J2EE [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What are the good email address validation libraries for Java? Are there any alternatives to commons validator?
Using the official java email package is the easiest:
public static boolean isValidEmailAddress(String email) {
boolean result = true;
try {
InternetAddress emailAddr = new InternetAddress(email);
emailAddr.validate();
} catch (AddressException ex) {
result = false;
}
return result;
}
Apache Commons is generally known as a solid project. Keep in mind, though, you'll still have to send a verification email to the address if you want to ensure it's a real email, and that the owner wants it used on your site.
EDIT: There was a bug where it was too restrictive on domain, causing it to not accept valid emails from new TLDs.
This bug was resolved on 03/Jan/15 02:48 in commons-validator version 1.4.1
Apache Commons validator can be used as mentioned in the other answers.
pom.xml:
<dependency>
<groupId>commons-validator</groupId>
<artifactId>commons-validator</artifactId>
<version>1.4.1</version>
</dependency>
build.gradle:
compile 'commons-validator:commons-validator:1.4.1'
The import:
import org.apache.commons.validator.routines.EmailValidator;
The code:
String email = "myName#example.com";
boolean valid = EmailValidator.getInstance().isValid(email);
and to allow local addresses
boolean allowLocal = true;
boolean valid = EmailValidator.getInstance(allowLocal).isValid(email);
Late answer, but I think it is simple and worthy:
public boolean isValidEmailAddress(String email) {
String ePattern = "^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|(([a-zA-Z\\-0-9]+\\.)+[a-zA-Z]{2,}))$";
java.util.regex.Pattern p = java.util.regex.Pattern.compile(ePattern);
java.util.regex.Matcher m = p.matcher(email);
return m.matches();
}
Test Cases:
For production purpose, Domain Name validations should be performed network-wise.
If you are trying to do a form validation received from the client, or just a bean validation - keep it simple.
It's better to do a loose email validation rather than to do a strict one and reject some people, (e.g. when they are trying to register for your web service).
With almost anything allowed in the username part of the email and so many new domains being added literally every month (e.g. .company, .entreprise, .estate), it's safer not to be restrictive:
Pattern pattern = Pattern.compile("^.+#.+\\..+$");
Matcher matcher = pattern.matcher(email);
Late to the question, here, but: I maintain a class at this address: http://lacinato.com/cm/software/emailrelated/emailaddress
It is based on Les Hazlewood's class, but has numerous improvements and fixes a few bugs. Apache license.
I believe it is the most capable email parser in Java, and I have yet to see one more capable in any language, though there may be one out there. It's not a lexer-style parser, but uses some complicated java regex, and thus is not as efficient as it could be, but my company has parsed well over 10 billion real-world addresses with it: it's certainly usable in a high-performance situation. Maybe once a year it'll hit an address that causes a regex stack overflow (appropriately), but these are spam addresses which are hundreds or thousands of characters long with many many quotes and parenthesis and the like.
RFC 2822 and the related specs are really quite permissive in terms of email addresses, so a class like this is overkill for most uses. For example, the following is a legitimate address, according to spec, spaces and all:
"<bob \" (here) " < (hi there) "bob(the man)smith" (hi) # (there) example.com (hello) > (again)
No mail server would allow that, but this class can parse it (and rewrite it to a usable form).
We found the existing Java email parser options to be insufficiently durable (meaning, all of them could not parse some valid addresses), so we created this class.
The code is well-documented and has a lot of easy-to-change options to allow or disallow certain email forms. It also provides a lot of methods to access certain parts of the address (left-hand side, right-hand side, personal names, comments, etc), to parse/validate mailbox-list headers, to parse/validate the return-path (which is unique among the headers), and so forth.
The code as written has a javamail dependency, but it's easy to remove if you don't want the minor functionality it provides.
I'm just wondering why nobody came up with #Email from Hibernate Validator's additional constraints. The validator itself is EmailValidator.
Les Hazlewood has written a very thorough RFC 2822 compliant email validator class using Java regular expressions. You can find it at http://www.leshazlewood.com/?p=23. However, its thoroughness (or the Java RE implementation) leads to inefficiency - read the comments about parsing times for long addresses.
I ported some of the code in Zend_Validator_Email:
#FacesValidator("emailValidator")
public class EmailAddressValidator implements Validator {
private String localPart;
private String hostName;
private boolean domain = true;
Locale locale;
ResourceBundle bundle;
private List<FacesMessage> messages = new ArrayList<FacesMessage>();
private HostnameValidator hostnameValidator;
#Override
public void validate(FacesContext context, UIComponent component, Object value) throws ValidatorException {
setOptions(component);
String email = (String) value;
boolean result = true;
Pattern pattern = Pattern.compile("^(.+)#([^#]+[^.])$");
Matcher matcher = pattern.matcher(email);
locale = context.getViewRoot().getLocale();
bundle = ResourceBundle.getBundle("com.myapp.resources.validationMessages", locale);
boolean length = true;
boolean local = true;
if (matcher.find()) {
localPart = matcher.group(1);
hostName = matcher.group(2);
if (localPart.length() > 64 || hostName.length() > 255) {
length = false;
addMessage("enterValidEmail", "email.AddressLengthExceeded");
}
if (domain == true) {
hostnameValidator = new HostnameValidator();
hostnameValidator.validate(context, component, hostName);
}
local = validateLocalPart();
if (local && length) {
result = true;
} else {
result = false;
}
} else {
result = false;
addMessage("enterValidEmail", "invalidEmailAddress");
}
if (result == false) {
throw new ValidatorException(messages);
}
}
private boolean validateLocalPart() {
// First try to match the local part on the common dot-atom format
boolean result = false;
// Dot-atom characters are: 1*atext *("." 1*atext)
// atext: ALPHA / DIGIT / and "!", "#", "$", "%", "&", "'", "*",
// "+", "-", "/", "=", "?", "^", "_", "`", "{", "|", "}", "~"
String atext = "a-zA-Z0-9\\u0021\\u0023\\u0024\\u0025\\u0026\\u0027\\u002a"
+ "\\u002b\\u002d\\u002f\\u003d\\u003f\\u005e\\u005f\\u0060\\u007b"
+ "\\u007c\\u007d\\u007e";
Pattern regex = Pattern.compile("^["+atext+"]+(\\u002e+["+atext+"]+)*$");
Matcher matcher = regex.matcher(localPart);
if (matcher.find()) {
result = true;
} else {
// Try quoted string format
// Quoted-string characters are: DQUOTE *([FWS] qtext/quoted-pair) [FWS] DQUOTE
// qtext: Non white space controls, and the rest of the US-ASCII characters not
// including "\" or the quote character
String noWsCtl = "\\u0001-\\u0008\\u000b\\u000c\\u000e-\\u001f\\u007f";
String qText = noWsCtl + "\\u0021\\u0023-\\u005b\\u005d-\\u007e";
String ws = "\\u0020\\u0009";
regex = Pattern.compile("^\\u0022(["+ws+qText+"])*["+ws+"]?\\u0022$");
matcher = regex.matcher(localPart);
if (matcher.find()) {
result = true;
} else {
addMessage("enterValidEmail", "email.AddressDotAtom");
addMessage("enterValidEmail", "email.AddressQuotedString");
addMessage("enterValidEmail", "email.AddressInvalidLocalPart");
}
}
return result;
}
private void addMessage(String detail, String summary) {
String detailMsg = bundle.getString(detail);
String summaryMsg = bundle.getString(summary);
messages.add(new FacesMessage(FacesMessage.SEVERITY_ERROR, summaryMsg, detailMsg));
}
private void setOptions(UIComponent component) {
Boolean domainOption = Boolean.valueOf((String) component.getAttributes().get("domain"));
//domain = (domainOption == null) ? true : domainOption.booleanValue();
}
}
With a hostname validator as follows:
#FacesValidator("hostNameValidator")
public class HostnameValidator implements Validator {
private Locale locale;
private ResourceBundle bundle;
private List<FacesMessage> messages;
private boolean checkTld = true;
private boolean allowLocal = false;
private boolean allowDNS = true;
private String tld;
private String[] validTlds = {"ac", "ad", "ae", "aero", "af", "ag", "ai",
"al", "am", "an", "ao", "aq", "ar", "arpa", "as", "asia", "at", "au",
"aw", "ax", "az", "ba", "bb", "bd", "be", "bf", "bg", "bh", "bi", "biz",
"bj", "bm", "bn", "bo", "br", "bs", "bt", "bv", "bw", "by", "bz", "ca",
"cat", "cc", "cd", "cf", "cg", "ch", "ci", "ck", "cl", "cm", "cn", "co",
"com", "coop", "cr", "cu", "cv", "cx", "cy", "cz", "de", "dj", "dk",
"dm", "do", "dz", "ec", "edu", "ee", "eg", "er", "es", "et", "eu", "fi",
"fj", "fk", "fm", "fo", "fr", "ga", "gb", "gd", "ge", "gf", "gg", "gh",
"gi", "gl", "gm", "gn", "gov", "gp", "gq", "gr", "gs", "gt", "gu", "gw",
"gy", "hk", "hm", "hn", "hr", "ht", "hu", "id", "ie", "il", "im", "in",
"info", "int", "io", "iq", "ir", "is", "it", "je", "jm", "jo", "jobs",
"jp", "ke", "kg", "kh", "ki", "km", "kn", "kp", "kr", "kw", "ky", "kz",
"la", "lb", "lc", "li", "lk", "lr", "ls", "lt", "lu", "lv", "ly", "ma",
"mc", "md", "me", "mg", "mh", "mil", "mk", "ml", "mm", "mn", "mo",
"mobi", "mp", "mq", "mr", "ms", "mt", "mu", "museum", "mv", "mw", "mx",
"my", "mz", "na", "name", "nc", "ne", "net", "nf", "ng", "ni", "nl",
"no", "np", "nr", "nu", "nz", "om", "org", "pa", "pe", "pf", "pg", "ph",
"pk", "pl", "pm", "pn", "pr", "pro", "ps", "pt", "pw", "py", "qa", "re",
"ro", "rs", "ru", "rw", "sa", "sb", "sc", "sd", "se", "sg", "sh", "si",
"sj", "sk", "sl", "sm", "sn", "so", "sr", "st", "su", "sv", "sy", "sz",
"tc", "td", "tel", "tf", "tg", "th", "tj", "tk", "tl", "tm", "tn", "to",
"tp", "tr", "travel", "tt", "tv", "tw", "tz", "ua", "ug", "uk", "um",
"us", "uy", "uz", "va", "vc", "ve", "vg", "vi", "vn", "vu", "wf", "ws",
"ye", "yt", "yu", "za", "zm", "zw"};
private Map<String, Map<Integer, Integer>> idnLength;
private void init() {
Map<Integer, Integer> biz = new HashMap<Integer, Integer>();
biz.put(5, 17);
biz.put(11, 15);
biz.put(12, 20);
Map<Integer, Integer> cn = new HashMap<Integer, Integer>();
cn.put(1, 20);
Map<Integer, Integer> com = new HashMap<Integer, Integer>();
com.put(3, 17);
com.put(5, 20);
Map<Integer, Integer> hk = new HashMap<Integer, Integer>();
hk.put(1, 15);
Map<Integer, Integer> info = new HashMap<Integer, Integer>();
info.put(4, 17);
Map<Integer, Integer> kr = new HashMap<Integer, Integer>();
kr.put(1, 17);
Map<Integer, Integer> net = new HashMap<Integer, Integer>();
net.put(3, 17);
net.put(5, 20);
Map<Integer, Integer> org = new HashMap<Integer, Integer>();
org.put(6, 17);
Map<Integer, Integer> tw = new HashMap<Integer, Integer>();
tw.put(1, 20);
Map<Integer, Integer> idn1 = new HashMap<Integer, Integer>();
idn1.put(1, 20);
Map<Integer, Integer> idn2 = new HashMap<Integer, Integer>();
idn2.put(1, 20);
Map<Integer, Integer> idn3 = new HashMap<Integer, Integer>();
idn3.put(1, 20);
Map<Integer, Integer> idn4 = new HashMap<Integer, Integer>();
idn4.put(1, 20);
idnLength = new HashMap<String, Map<Integer, Integer>>();
idnLength.put("BIZ", biz);
idnLength.put("CN", cn);
idnLength.put("COM", com);
idnLength.put("HK", hk);
idnLength.put("INFO", info);
idnLength.put("KR", kr);
idnLength.put("NET", net);
idnLength.put("ORG", org);
idnLength.put("TW", tw);
idnLength.put("ایران", idn1);
idnLength.put("中国", idn2);
idnLength.put("公司", idn3);
idnLength.put("网络", idn4);
messages = new ArrayList<FacesMessage>();
}
public HostnameValidator() {
init();
}
#Override
public void validate(FacesContext context, UIComponent component, Object value) throws ValidatorException {
String hostName = (String) value;
locale = context.getViewRoot().getLocale();
bundle = ResourceBundle.getBundle("com.myapp.resources.validationMessages", locale);
Pattern ipPattern = Pattern.compile("^[0-9a-f:\\.]*$", Pattern.CASE_INSENSITIVE);
Matcher ipMatcher = ipPattern.matcher(hostName);
if (ipMatcher.find()) {
addMessage("hostname.IpAddressNotAllowed");
throw new ValidatorException(messages);
}
boolean result = false;
// removes last dot (.) from hostname
hostName = hostName.replaceAll("(\\.)+$", "");
String[] domainParts = hostName.split("\\.");
boolean status = false;
// Check input against DNS hostname schema
if ((domainParts.length > 1) && (hostName.length() > 4) && (hostName.length() < 255)) {
status = false;
dowhile:
do {
// First check TLD
int lastIndex = domainParts.length - 1;
String domainEnding = domainParts[lastIndex];
Pattern tldRegex = Pattern.compile("([^.]{2,10})", Pattern.CASE_INSENSITIVE);
Matcher tldMatcher = tldRegex.matcher(domainEnding);
if (tldMatcher.find() || domainEnding.equals("ایران")
|| domainEnding.equals("中国")
|| domainEnding.equals("公司")
|| domainEnding.equals("网络")) {
// Hostname characters are: *(label dot)(label dot label); max 254 chars
// label: id-prefix [*ldh{61} id-prefix]; max 63 chars
// id-prefix: alpha / digit
// ldh: alpha / digit / dash
// Match TLD against known list
tld = (String) tldMatcher.group(1).toLowerCase().trim();
if (checkTld == true) {
boolean foundTld = false;
for (int i = 0; i < validTlds.length; i++) {
if (tld.equals(validTlds[i])) {
foundTld = true;
}
}
if (foundTld == false) {
status = false;
addMessage("hostname.UnknownTld");
break dowhile;
}
}
/**
* Match against IDN hostnames
* Note: Keep label regex short to avoid issues with long patterns when matching IDN hostnames
*/
List<String> regexChars = getIdnRegexChars();
// Check each hostname part
int check = 0;
for (String domainPart : domainParts) {
// Decode Punycode domainnames to IDN
if (domainPart.indexOf("xn--") == 0) {
domainPart = decodePunycode(domainPart.substring(4));
}
// Check dash (-) does not start, end or appear in 3rd and 4th positions
if (domainPart.indexOf("-") == 0
|| (domainPart.length() > 2 && domainPart.indexOf("-", 2) == 2 && domainPart.indexOf("-", 3) == 3)
|| (domainPart.indexOf("-") == (domainPart.length() - 1))) {
status = false;
addMessage("hostname.DashCharacter");
break dowhile;
}
// Check each domain part
boolean checked = false;
for (int key = 0; key < regexChars.size(); key++) {
String regexChar = regexChars.get(key);
Pattern regex = Pattern.compile(regexChar);
Matcher regexMatcher = regex.matcher(domainPart);
status = regexMatcher.find();
if (status) {
int length = 63;
if (idnLength.containsKey(tld.toUpperCase())
&& idnLength.get(tld.toUpperCase()).containsKey(key)) {
length = idnLength.get(tld.toUpperCase()).get(key);
}
int utf8Length;
try {
utf8Length = domainPart.getBytes("UTF8").length;
if (utf8Length > length) {
addMessage("hostname.InvalidHostname");
} else {
checked = true;
break;
}
} catch (UnsupportedEncodingException ex) {
Logger.getLogger(HostnameValidator.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
if (checked) {
++check;
}
}
// If one of the labels doesn't match, the hostname is invalid
if (check != domainParts.length) {
status = false;
addMessage("hostname.InvalidHostnameSchema");
}
} else {
// Hostname not long enough
status = false;
addMessage("hostname.UndecipherableTld");
}
} while (false);
if (status == true && allowDNS) {
result = true;
}
} else if (allowDNS == true) {
addMessage("hostname.InvalidHostname");
throw new ValidatorException(messages);
}
// Check input against local network name schema;
Pattern regexLocal = Pattern.compile("^(([a-zA-Z0-9\\x2d]{1,63}\\x2e)*[a-zA-Z0-9\\x2d]{1,63}){1,254}$", Pattern.CASE_INSENSITIVE);
boolean checkLocal = regexLocal.matcher(hostName).find();
if (allowLocal && !status) {
if (checkLocal) {
result = true;
} else {
// If the input does not pass as a local network name, add a message
result = false;
addMessage("hostname.InvalidLocalName");
}
}
// If local network names are not allowed, add a message
if (checkLocal && !allowLocal && !status) {
result = false;
addMessage("hostname.LocalNameNotAllowed");
}
if (result == false) {
throw new ValidatorException(messages);
}
}
private void addMessage(String msg) {
String bundlMsg = bundle.getString(msg);
messages.add(new FacesMessage(FacesMessage.SEVERITY_ERROR, bundlMsg, bundlMsg));
}
/**
* Returns a list of regex patterns for the matched TLD
* #param tld
* #return
*/
private List<String> getIdnRegexChars() {
List<String> regexChars = new ArrayList<String>();
regexChars.add("^[a-z0-9\\x2d]{1,63}$");
Document doc = null;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
try {
InputStream validIdns = getClass().getClassLoader().getResourceAsStream("com/myapp/resources/validIDNs_1.xml");
DocumentBuilder builder = factory.newDocumentBuilder();
doc = builder.parse(validIdns);
doc.getDocumentElement().normalize();
} catch (SAXException ex) {
Logger.getLogger(HostnameValidator.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(HostnameValidator.class.getName()).log(Level.SEVERE, null, ex);
} catch (ParserConfigurationException ex) {
Logger.getLogger(HostnameValidator.class.getName()).log(Level.SEVERE, null, ex);
}
// prepare XPath
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = null;
String xpathRoute = "//idn[tld=\'" + tld.toUpperCase() + "\']/pattern/text()";
try {
XPathExpression expr;
expr = xpath.compile(xpathRoute);
Object res = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) res;
} catch (XPathExpressionException ex) {
Logger.getLogger(HostnameValidator.class.getName()).log(Level.SEVERE, null, ex);
}
for (int i = 0; i < nodes.getLength(); i++) {
regexChars.add(nodes.item(i).getNodeValue());
}
return regexChars;
}
/**
* Decode Punycode string
* #param encoded
* #return
*/
private String decodePunycode(String encoded) {
Pattern regex = Pattern.compile("([^a-z0-9\\x2d]{1,10})", Pattern.CASE_INSENSITIVE);
Matcher matcher = regex.matcher(encoded);
boolean found = matcher.find();
if (encoded.isEmpty() || found) {
// no punycode encoded string, return as is
addMessage("hostname.CannotDecodePunycode");
throw new ValidatorException(messages);
}
int separator = encoded.lastIndexOf("-");
List<Integer> decoded = new ArrayList<Integer>();
if (separator > 0) {
for (int x = 0; x < separator; ++x) {
decoded.add((int) encoded.charAt(x));
}
} else {
addMessage("hostname.CannotDecodePunycode");
throw new ValidatorException(messages);
}
int lengthd = decoded.size();
int lengthe = encoded.length();
// decoding
boolean init = true;
int base = 72;
int index = 0;
int ch = 0x80;
int indexeStart = (separator == 1) ? (separator + 1) : 0;
for (int indexe = indexeStart; indexe < lengthe; ++lengthd) {
int oldIndex = index;
int pos = 1;
for (int key = 36; true; key += 36) {
int hex = (int) encoded.charAt(indexe++);
int digit = (hex - 48 < 10) ? hex - 22
: ((hex - 65 < 26) ? hex - 65
: ((hex - 97 < 26) ? hex - 97
: 36));
index += digit * pos;
int tag = (key <= base) ? 1 : ((key >= base + 26) ? 26 : (key - base));
if (digit < tag) {
break;
}
pos = (int) (pos * (36 - tag));
}
int delta = (int) (init ? ((index - oldIndex) / 700) : ((index - oldIndex) / 2));
delta += (int) (delta / (lengthd + 1));
int key;
for (key = 0; delta > 910; key += 36) {
delta = (int) (delta / 35);
}
base = (int) (key + 36 * delta / (delta + 38));
init = false;
ch += (int) (index / (lengthd + 1));
index %= (lengthd + 1);
if (lengthd > 0) {
for (int i = lengthd; i > index; i--) {
decoded.set(i, decoded.get(i - 1));
}
}
decoded.set(index++, ch);
}
// convert decoded ucs4 to utf8 string
StringBuilder sb = new StringBuilder();
for (int i = 0; i < decoded.size(); i++) {
int value = decoded.get(i);
if (value < 128) {
sb.append((char) value);
} else if (value < (1 << 11)) {
sb.append((char) (192 + (value >> 6)));
sb.append((char) (128 + (value & 63)));
} else if (value < (1 << 16)) {
sb.append((char) (224 + (value >> 12)));
sb.append((char) (128 + ((value >> 6) & 63)));
sb.append((char) (128 + (value & 63)));
} else if (value < (1 << 21)) {
sb.append((char) (240 + (value >> 18)));
sb.append((char) (128 + ((value >> 12) & 63)));
sb.append((char) (128 + ((value >> 6) & 63)));
sb.append((char) (128 + (value & 63)));
} else {
addMessage("hostname.CannotDecodePunycode");
throw new ValidatorException(messages);
}
}
return sb.toString();
}
/**
* Eliminates empty values from input array
* #param data
* #return
*/
private String[] verifyArray(String[] data) {
List<String> result = new ArrayList<String>();
for (String s : data) {
if (!s.equals("")) {
result.add(s);
}
}
return result.toArray(new String[result.size()]);
}
}
And a validIDNs.xml with regex patterns for the different tlds (too big to include:)
<idnlist>
<idn>
<tld>AC</tld>
<pattern>^[\u002d0-9a-zà-öø-ÿāăąćĉċčďđēėęěĝġģĥħīįĵķĺļľŀłńņňŋőœŕŗřśŝşšţťŧūŭůűųŵŷźżž]{1,63}$</pattern>
</idn>
<idn>
<tld>AR</tld>
<pattern>^[\u002d0-9a-zà-ãç-êìíñ-õü]{1,63}$</pattern>
</idn>
<idn>
<tld>AS</tld>
<pattern>/^[\u002d0-9a-zà-öø-ÿāăąćĉċčďđēĕėęěĝğġģĥħĩīĭįıĵķĸĺļľłńņňŋōŏőœŕŗřśŝşšţťŧũūŭůűųŵŷźż]{1,63}$</pattern>
</idn>
<idn>
<tld>AT</tld>
<pattern>/^[\u002d0-9a-zà-öø-ÿœšž]{1,63}$</pattern>
</idn>
<idn>
<tld>BIZ</tld>
<pattern>^[\u002d0-9a-zäåæéöøü]{1,63}$</pattern>
<pattern>^[\u002d0-9a-záéíñóúü]{1,63}$</pattern>
<pattern>^[\u002d0-9a-záéíóöúüőű]{1,63}$</pattern>
</id>
</idlist>
public class Validations {
private Pattern regexPattern;
private Matcher regMatcher;
public String validateEmailAddress(String emailAddress) {
regexPattern = Pattern.compile("^[(a-zA-Z-0-9-\\_\\+\\.)]+#[(a-z-A-z)]+\\.[(a-zA-z)]{2,3}$");
regMatcher = regexPattern.matcher(emailAddress);
if(regMatcher.matches()) {
return "Valid Email Address";
} else {
return "Invalid Email Address";
}
}
public String validateMobileNumber(String mobileNumber) {
regexPattern = Pattern.compile("^\\+[0-9]{2,3}+-[0-9]{10}$");
regMatcher = regexPattern.matcher(mobileNumber);
if(regMatcher.matches()) {
return "Valid Mobile Number";
} else {
return "Invalid Mobile Number";
}
}
public static void main(String[] args) {
String emailAddress = "suryaprakash.pisay#gmail.com";
String mobileNumber = "+91-9986571622";
Validations validations = new Validations();
System.out.println(validations.validateEmailAddress(emailAddress));
System.out.println(validations.validateMobileNumber(mobileNumber));
}
}
If you're looking to verify whether an email address is valid, then VRFY will get you some of the way. I've found it's useful for validating intranet addresses (that is, email addresses for internal sites). However it's less useful for internet mail servers (see the caveats at the top of this page)
Although there are many alternatives to Apache commons, their implementations are rudimentary at best (like Apache commons' implementation itself) and even dead wrong in other cases.
I'd also stay away from so called simple 'non-restrictive' regex; there's no such thing. For example # is allowed multiple times depending on context, how do you know the required one is there? Simple regex won't understand it, even though the email should be valid. Anything more complex becomes error-prone or even contain hidden performance killers. How are you going to maintain something like this?
The only comprehensive RFC compliant regex based validator I'm aware of is email-rfc2822-validator with its 'refined' regex appropriately named Dragons.java. It supports only the older RFC-2822 spec though, although appropriate enough for modern needs (RFC-5322 updates it in areas already out of scope for daily use cases).
But really what you want is a lexer that properly parses a string and breaks it up into the component structure according to the RFC grammar. EmailValidator4J seems promising in that regard, but is still young and limited.
Another option you have is using a webservice such as Mailgun's battle-tested validation webservice or Mailboxlayer API (just took the first Google results). It is not strictly RFC compliant, but works well enough for modern needs.
What do you want to validate? The email address?
The email address can only be checked for its format conformance. See the standard: RFC2822. Best way to do that is a regular expression. You will never know if really exists without sending an email.
I checked the commons validator. It contains an org.apache.commons.validator.EmailValidator class. Seems to be a good starting point.
Current Apache Commons Validator version is 1.3.1.
Class that validates is org.apache.commons.validator.EmailValidator. It has an import for org.apache.oro.text.perl.Perl5Util which is from a retired Jakarta ORO project.
BTW, I found that there is a 1.4 version, here are the API docs. On the site it says: "Last Published: 05 March 2008 | Version: 1.4-SNAPSHOT", but that's not final. Only way to build yourself (but this is a snapshot, not RELEASE) and use, or download from here. This means 1.4 has not been made final for three years (2008-2011). This is not in Apache's style.
I'm looking for a better option, but didn't find one that is very adopted. I want to use something that is well tested, don't want to hit any bugs.
You may also want to check for the length - emails are a maximum of 254 chars long. I use the apache commons validator and it doesn't check for this.
There don't seem to be any perfect libraries or ways to do this yourself, unless you have to time to send an email to the email address and wait for a response (this might not be an option though). I ended up using a suggestion from here http://blog.logichigh.com/2010/09/02/validating-an-e-mail-address/ and adjusting the code so it would work in Java.
public static boolean isValidEmailAddress(String email) {
boolean stricterFilter = true;
String stricterFilterString = "[A-Z0-9a-z._%+-]+#[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}";
String laxString = ".+#.+\\.[A-Za-z]{2}[A-Za-z]*";
String emailRegex = stricterFilter ? stricterFilterString : laxString;
java.util.regex.Pattern p = java.util.regex.Pattern.compile(emailRegex);
java.util.regex.Matcher m = p.matcher(email);
return m.matches();
}
This is the best method:
public static boolean isValidEmail(String enteredEmail){
String EMAIL_REGIX = "^[\\\\w!#$%&’*+/=?`{|}~^-]+(?:\\\\.[\\\\w!#$%&’*+/=?`{|}~^-]+)*#(?:[a-zA-Z0-9-]+\\\\.)+[a-zA-Z]{2,6}$";
Pattern pattern = Pattern.compile(EMAIL_REGIX);
Matcher matcher = pattern.matcher(enteredEmail);
return ((!enteredEmail.isEmpty()) && (enteredEmail!=null) && (matcher.matches()));
}
Sources:-
http://howtodoinjava.com/2014/11/11/java-regex-validate-email-address/
http://www.rfc-editor.org/rfc/rfc5322.txt
Another option is use the Hibernate email validator, using the annotation #Email or using the validator class programatically, like:
import org.hibernate.validator.internal.constraintvalidators.hv.EmailValidator;
class Validator {
// code
private boolean isValidEmail(String email) {
EmailValidator emailValidator = new EmailValidator();
return emailValidator.isValid(email, null);
}
}
Heres my pragmatic approach, where I just want reasonable distinct blah#domain addresses using the allowable characters from the RFC. Addresses must be converted to lowercase beforehand.
public class EmailAddressValidator {
private static final String domainChars = "a-z0-9\\-";
private static final String atomChars = "a-z0-9\\Q!#$%&'*+-/=?^_`{|}~\\E";
private static final String emailRegex = "^" + dot(atomChars) + "#" + dot(domainChars) + "$";
private static final Pattern emailPattern = Pattern.compile(emailRegex);
private static String dot(String chars) {
return "[" + chars + "]+(?:\\.[" + chars + "]+)*";
}
public static boolean isValidEmailAddress(String address) {
return address != null && emailPattern.matcher(address).matches();
}
}

Decompress a string with more nested strings

The assignment consists in decompress a string. In particular, the code has to work for 3 samples as illustrated in the picture.
My code here works in the first 2 of the samples. However, I am not able to come up with the 3rd sample. Probably I did not understand probably the concept of recursion. Can you help me?
import java.util.Scanner;
public class Compression4 {
public static void main(String[] args)
{
Scanner in = new Scanner(System.in);
String input=in.next();
System.out.println(uncompress(input));
}
public static boolean flag = true;
public static String uncompress(String compressedText)
{
return uncompress(compressedText, "", "");
}
public static String getMultiple(String x, int N) {
if (N == 0) return "";
return ""+x+getMultiple(x,N-1);
}
public static String uncompress(String text, String count, String output)
{
if (text.equals(""))
{
return output;
}
if(text.charAt(0) == '(')
{
int FirstIndex = text.indexOf("(")+1;
String inner = text.substring(FirstIndex, text.lastIndexOf(")"));
//System.out.println(inner);
flag = false;
return uncompress (inner, count, output);
}
else if (Character.isLetter(text.charAt(0)))
{
//letter case - need to take the count we have accrued, parse it into an integer and add to output
if (flag==true)
{
//System.out.println(count);// * text.charAt(0);
String s = String.valueOf(text.charAt(0));
output += getMultiple(s,Integer.parseInt(count));
count ="1";
}
else
{
//System.out.println(count);// * text.charAt(0);
output += getMultiple(text,Integer.parseInt(count));
//System.out.println("output: "+output);
count="0";
}
}
else if(Character.isDigit(text.charAt(0)))
{
//digit case - need to add to the count but keep as a string because must be parsed later
if(flag)
count += (""+text.charAt(0));
else
{
count = "0";
count += (""+text.charAt(0));
}
}
//parse the *remainder* of the string, one character at a time, so pass in the substring(1)
return uncompress(text.substring(1), count, output);
}
}
Sorry for the long code but it's more easy to explain with code than with words.
Premise:
I think to the problem as an interpreter of a language to render a string
the language is simple and functional so recursive interpretation is possible
Algorithm phases:
First: tokenize the expression (to work at an higher level of abstraction)
Second: parse the expression just tokenized
Recursion: the logic is based on the syntax of the language. Key concepts of a recursion:
the base cases and the recursive cases
the state necessary to a single recursion (local variables of recursion, those passed as parameters to the recursive method)
the state for the all recursion (global variables of recursion, those read/write in some specific recursion)
I've made many comments to explain what the algorithm is doing. If it's not clear I can explain it better.
import java.util.ArrayList;
import java.util.List;
public class TestStringDecompression {
// simpleExpr examples: a | b | 123a | 123b | 123(a) | 123(ab) | 123(ba) | (ab) | (ba)
// 11ab = aaaaaaaaaaab = = expression = simpleExpr simpleExpr = 11a b
// 4(ab) = abababab = expression = simpleExpr = 4(ab)
// 2(3b3(ab)) = bbbabababbbbababab = expression = compositeExpr = 2 ( simpleExpr simpleExpr ) = 2 ( 3b 3(ab) )
public static void main(String[] args) {
System.out.println(new StringInflater().inflate("11ab"));
System.out.println(new StringInflater().inflate("4(ab)"));
System.out.println(new StringInflater().inflate("2(3b3(ab))"));
}
public static class StringInflater {
// This store the position of the last parsed token
private int posLastParsedToken = 0;
public String inflate(String expression) {
return parse(tokenize(expression), 0, false);
}
/**
* Language tokens:
* <ul>
* <li>literals:
* <ul>
* <li>intLiteral = [0-9]*</li>
* <li>charLiteral = [ab]</li>
* </ul>
* </li>
* <li>separators:
* <ul>
* <li>leftParen = '('</li>
* <li>rightParen = ')'</li>
* </ul>
* </li>
* </ul>
*/
private Object[] tokenize(String expression) {
List<Object> tokens = new ArrayList<Object>();
int i = 0;
while (i < expression.length()) {
if ('0' <= expression.charAt(i) && expression.charAt(i) <= '9') {
String number = "";
while ('0' <= expression.charAt(i) && expression.charAt(i) <= '9' && i < expression.length()) {
number += expression.charAt(i++);
}
tokens.add(Integer.valueOf(number));
} else {
tokens.add(expression.charAt(i++));
}
}
return tokens.toArray(new Object[tokens.size()]);
}
/**
* Language syntax:
* <ul>
* <li>simpleExpr = [intLiteral] charLiteral | [intLiteral] leftParen charLiteral+ rightParen</li>
* <li>compositeExpr = [intLiteral] leftParen (simpleExpr | compositeExpr)+ rightParen</li>
* <li>expression = (simpleExpr | compositeExpr)+</li>
* </ul>
*/
private String parse(Object[] tokens, int pos, boolean nested) {
posLastParsedToken = pos;
String result = "";
if (tokens[pos] instanceof Integer) {
/** it's a intLiteral */
// get quantifier value
int repetition = (int) tokens[pos];
// lookahead for (
if (tokens[pos + 1].equals("(")) {
// composite repetition, it could be:
// simpleExpr: "[intLiteral] leftParen charLiteral+ rightParen"
// compositeExpr: "[intLiteral] leftParen (simpleExpr | compositeExpr)+ rightParen"
result = parse(tokens, pos + 1, true);
} else {
// simple repetition, it could be:
// simpleExpr: [intLiteral] charLiteral
result = parse(tokens, pos + 1, false);
}
result = repeat(result, repetition);
// evaluate the rest of the expression because syntax allows it
if (posLastParsedToken + 1 == tokens.length) {
// end of the expression
return result;
} else {
// there are other simpleExpr or compositeExpr to parse
return result + parse(tokens, posLastParsedToken + 1, false);
}
} else if (tokens[pos].equals('(')) {
/** it's a leftParen */
// an open paren means what follow this token is considered nested (useful for string to treat as char sequence)
return parse(tokens, pos + 1, true);
} else if (tokens[pos].equals(')')) {
/** it's a rightParen */
// a closed paren, nothing to render
return "";
} else {
/** it's a charLiteral */
if (nested) {
// it's nested between paren, so more parsing is requested to consume next charLiteral or next simpleExpr or compositeExpr
return tokens[pos] + parse(tokens, pos + 1, nested);
} else {
// it's not nested between paren, return charLiteral as is
return "" + tokens[pos];
}
}
}
private String repeat(String s, int repetition) {
StringBuilder result = new StringBuilder();
for (int i = 0; i < repetition; i++) {
result.append(s);
}
return result.toString();
}
}
}

Use JLine to Complete Multiple Commands on One Line

I was wondering how I could implement an ArgumentCompleter such that if I complete a full and valid command, then it would begin tab completing for a new command.
I would have assumed it could be constructed doing something like this:
final ConsoleReader consoleReader = new ConsoleReader()
final ArgumentCompleter cyclicalArgument = new ArgumentCompleter();
cyclicalArgument.getCompleters().addAll(Arrays.asList(
new StringsCompleter("foo"),
new StringsCompleter("bar"),
cyclicalArgument));
consoleReader.addCompleter(cyclicalArgument);
consoleReader.readLine();
However right now this stops working after tab completeing the first foo bar
Is anyone familiar enough with the library to tell me how I would go about implementing this? Or is there a known way to do this that I am missing? Also this is using JLine2.
That was quite a task :-)
It is handled by the completer you are using. The complete() method of the completer has to use for the search only what comes after the last blank.
If you look for example at the FileNameCompleter of the library: this is not done at all, so you will find no completion, because the completer searches for <input1> <input2> and not only for <input2> :-)
You will have to do your own implementation of a completer that is able to find input2.
Additionally the CompletionHandler has to append what you found to what you already typed.
Here is a basic implementation changing the default FileNameCompleter:
protected int matchFiles(final String buffer, final String translated, final File[] files,
final List<CharSequence> candidates) {
// THIS IS NEW
String[] allWords = translated.split(" ");
String lastWord = allWords[allWords.length - 1];
// the lastWord is used when searching the files now
// ---
if (files == null) {
return -1;
}
int matches = 0;
// first pass: just count the matches
for (File file : files) {
if (file.getAbsolutePath().startsWith(lastWord)) {
matches++;
}
}
for (File file : files) {
if (file.getAbsolutePath().startsWith(lastWord)) {
CharSequence name = file.getName() + (matches == 1 && file.isDirectory() ? this.separator() : " ");
candidates.add(this.render(file, name).toString());
}
}
final int index = buffer.lastIndexOf(this.separator());
return index + this.separator().length();
}
And here the complete()-Method of the CompletionHandler changing the default CandidateListCompletionHandler:
#Override
public boolean complete(final ConsoleReader reader, final List<CharSequence> candidates, final int pos)
throws IOException {
CursorBuffer buf = reader.getCursorBuffer();
// THIS IS NEW
String[] allWords = buf.toString().split(" ");
String firstWords = "";
if (allWords.length > 1) {
for (int i = 0; i < allWords.length - 1; i++) {
firstWords += allWords[i] + " ";
}
}
//-----
// if there is only one completion, then fill in the buffer
if (candidates.size() == 1) {
String value = Ansi.stripAnsi(candidates.get(0).toString());
if (buf.cursor == buf.buffer.length() && this.printSpaceAfterFullCompletion && !value.endsWith(" ")) {
value += " ";
}
// fail if the only candidate is the same as the current buffer
if (value.equals(buf.toString())) {
return false;
}
CandidateListCompletionHandler.setBuffer(reader, firstWords + " " + value, pos);
return true;
} else if (candidates.size() > 1) {
String value = this.getUnambiguousCompletions(candidates);
CandidateListCompletionHandler.setBuffer(reader, value, pos);
}
CandidateListCompletionHandler.printCandidates(reader, candidates);
// redraw the current console buffer
reader.drawLine();
return true;
}

RegEx -- Parsing strings in-between ${ }s but NOT '${ }'s

I want a regular expression that will extract _A_, 12345, Non_Literal_Left, and Non_Literal_Right from the following string:
... ${_A_}, ${12345}, '${Literal}' $yada yada'$[]' '${Non_Literal_Left} ${Non_Literal_Right}'
The closest I have been able to get is everything less the single quote restriction for literals:
Matcher matcher = Pattern.compile("\\$\\{(\\w+)\\}").matcher(s);
while (matcher.find()) {
result.add(matcher.group(1));
}
Which results in everything I want plus Literal, which I do not want to match.
Thanks in advance...
You could simply use a negative lookbehind:
"(?<!')\\$\\{(\\w+)\\}"
This will now only match if the $ is not preceded by '.
As Matt Ball mentioned in a comment, it might make sense to add another negative lookahead to the end, too:
"(?<!')\\$\\{(\\w+)\\}(?!')"
However, this will only matter if you have invalid/unmatched usage of ' as in ${Literal}' (in this case my first regex will still match Literal, the latter won't).
That was a joy.
Something tells me a RegEx expression would have been a little cleaner.
/**
* Utility class for parsing record field parameters in properties.
*
* #author Ryan
*
*/
public static class PropertyParser {
/**
* Stores the results of parsing a property.
*
* #author Ryan
*
*/
public static class ParsedParameters {
private final Set<String> literals;
private final Set<String> parameters;
private ParsedParameters() {
this.parameters = new HashSet<String>();
this.literals = new HashSet<String>();
}
/**
* Adds a literal property value to this object.
*
* #param string The literal property value to add to this object.
*/
private void addLiteral(String string) {
this.literals.add(string);
}
/**
* Adds a parameter name to this object.
*
* #param string The parameter name to add to this object.
*/
private void addParameter(String string) {
this.parameters.add(string);
}
/**
* Returns the set of literals stored in this object.
*
* #return The set of literals stored in this object.
*/
public Set<String> getLiterals() {
return this.literals;
}
/**
* Returns the set of parameters stored in this object.
*
* #return The set of parameters stored in this object.
*/
public Set<String> getParameters() {
return this.parameters;
}
}
private static final String BAD_FIELD_CHAR =
"Illegal character detected for field parameter: %c";
/**
* Extracts placeholder field name parameters from the input string.
* <p>
* Single quotes can be used to avoid the parser interpreting the ${...}
* as a field parameter.
* <p>
* For example, the parser would not detect any field parameters in the following string:
* <p>
* #!/bin/bash<br>
* # Echos the first argument<br>
* echo '${1}'<br>
* <p>
* The {#link #PropertySubstitutor()} is responsible for removing the single quotes
* surrounding the parameter when substituting the actual property value(s).
* <p>
* <b>Nested Parameters</b>
* <p>
* This parser itself will only parse the inner-most parameter or literal.
* <p>
* For example, ${Some${Value}} would actually be treated as a legal string, with
* 'Value' as the only field parameter extracted. During runtime substitution,
* this would result in ${Somebody} if the record value for the field "Value" was "body".
* <p>
* Theoretically, this parser could then be ran again to extract this generated parameter.
*
* #param string The property to parse for field parameters.
* #return An object containing the parsed parameters and literal values.
* #throws IllegalArgumentException If the property contains parameter syntax
* (i.e. ${text}) but contains illegal characters for the field.
* <p>
* Allowed characters for field names are alpha-numeric and underscores.
*/
public static ParsedParameters parseParametersAndLiterals(String string)
throws IllegalArgumentException {
if ((string == null) || string.isEmpty()) {
return new ParsedParameters();
}
ParsedParameters result = new ParsedParameters();
StringBuffer param = null;
Character badChar = null;
char c;
char p = '^';
boolean close = false;
boolean lQuote = false;
boolean open = false;
int l = string.length();
for (int i = 0; i < l; ++i) {
c = string.charAt(i);
if (!lQuote && (p == '\'') && (c == '$')) {
lQuote = true;
} else if ((p == '$') && (c == '{')) {
param = new StringBuffer();
open = true;
badChar = null;
} else if (open
&& (((c == '}') && (!lQuote || ((1 + i) == l))) || (lQuote && (p == '}')))) {
open = false;
close = true;
} else if (open) {
boolean validCharacter = Character.isLetterOrDigit(c) || (c == '_');
if (validCharacter || (lQuote && (c != '}'))) {
param.append(c);
}
if (!validCharacter && (c != '}')) {
badChar = c;
}
}
if (close) {
if ((badChar != null) && !(lQuote && (p == '}') && (c == '\''))) {
throw new IllegalArgumentException(String.format(BAD_FIELD_CHAR, badChar));
} else if (c != '\'') {
if (param.length() > 0) {
result.addParameter(param.toString());
}
} else {
result.addLiteral(param.toString());
}
lQuote = false;
close = false;
badChar = null;
}
p = c;
}
return result;
}
}
And tests, of course.
public class TestPropertyParser {
private Set<String> literals;
private Set<String> params;
private void assertLiteralsContains(String string) {
assertTrue(this.literals.contains(string));
}
private void assertParamsContains(String string) {
assertTrue(this.params.contains(string));
}
private void assertResultSizes(Integer paramSize, Integer literalSize) {
if (paramSize != null) {
assertNotNull(this.params);
assertEquals((int) paramSize, this.params.size());
} else {
assertNull(this.params);
}
if (literalSize != null) {
assertNotNull(this.literals);
assertEquals((int) literalSize, this.literals.size());
} else {
assertNull(this.literals);
}
}
private void parseAndSet(String stringToParse) {
ParsedParameters result = PropertyParser.parseParametersAndLiterals(stringToParse);
this.literals = result.getLiterals();
this.params = result.getParameters();
}
#Before
public void setup() {
this.params = new HashSet<String>();
this.literals = new HashSet<String>();
}
#Test(expected = IllegalArgumentException.class)
public void testParserInvalidParameterQuoteLeft() {
parseAndSet("'${Invalid Parameter}");
}
#Test(expected = IllegalArgumentException.class)
public void testParserInvalidParameterQuoteRight() {
parseAndSet("${Invalid Parameter}'");
}
#Test(expected = IllegalArgumentException.class)
public void testParserInvalidParameterSpaces() {
parseAndSet(" ${Invalid Parameter}");
}
#Test
public void testParserValidStrings() {
// Initialization condition.
assertResultSizes(0, 0);
// Null string.
parseAndSet(null);
assertResultSizes(0, 0);
// Empty string.
parseAndSet(new String());
assertResultSizes(0, 0);
// Single parameter.
parseAndSet("... ${_A_}, $yada yada'$[]' '${");
assertResultSizes(1, 0);
assertParamsContains("_A_");
// Many parameters and one literal.
parseAndSet("... ${_A_}, ${12345}, '${Literal}''${Non_Literal_Left} ${Non_Literal_Right}' ");
assertResultSizes(4, 1);
assertParamsContains("_A_");
assertParamsContains("12345");
assertParamsContains("Non_Literal_Left");
assertParamsContains("Non_Literal_Right");
assertLiteralsContains("Literal");
// Nested literal and odd bracket placements.
parseAndSet("''${Totally}''$}{$'${Single}");
assertResultSizes(1, 1);
assertParamsContains("Single");
assertLiteralsContains("Totally");
// Subset of ASCII characters.
parseAndSet("`1234567890-=qwertyuiop[]\\asdfghjkl;'zxcvbnm,./!##$%^&*()_+QWERTYUIOP{}|ASDFGHJKL:\"ZXCVBNM<>?");
assertResultSizes(0, 0);
// Illegal characters in literal.
parseAndSet("'${This literal is completely valid}'");
assertResultSizes(0, 1);
assertLiteralsContains("This literal is completely valid");
// Test incomplete literal, no closure.
parseAndSet("'${This literal is never closed");
assertResultSizes(0, 0);
// Test incomplete parameter from left.
parseAndSet("${Never_Closed");
assertResultSizes(0, 0);
// And again... with a parameter at the end.
parseAndSet("${Never_Closed${But_This_Is}");
assertResultSizes(1, 0);
assertParamsContains("But_This_Is");
// Empty parameter.
parseAndSet("${}");
assertResultSizes(0, 0);
// Restarting a new parameter within an already open parameter.
parseAndSet("${Perfectly valid${a}");
assertResultSizes(1, 0);
assertParamsContains("a");
// Variation of the above with quotes.
parseAndSet("'${Perfectly valid'${a}");
assertResultSizes(1, 0);
assertParamsContains("a");
// Variation of the above with quotes.
parseAndSet("${Perfectly valid'${a}");
assertResultSizes(1, 0);
assertParamsContains("a");
// Variation of the above with quotes.
parseAndSet("${Perfectly valid${a}'");
assertResultSizes(1, 0);
assertParamsContains("a");
// Variation of the above with quotes.
parseAndSet("${Perfectly valid'${a}'");
assertResultSizes(0, 1);
assertLiteralsContains("a");
// Variation of the above with spaces.
parseAndSet(" ${ Perfectly valid${a} ");
assertResultSizes(1, 0);
assertParamsContains("a");
// TODO Determine what the desired behavior is for nested literals and parameters.
// Test nested parameter in literal.
parseAndSet("'${Nested ${Parameter}}'");
assertResultSizes(1, 0);
assertParamsContains("Parameter");
// Nested parameter.
parseAndSet("${Nested_${Parameter}}'");
assertResultSizes(1, 0);
assertParamsContains("Parameter");
// Literal nested in a parameter.
parseAndSet(" ${Nested'${Literal}'}");
assertResultSizes(0, 1);
assertLiteralsContains("Literal");
}
}

Categories

Resources