I am creating an application which simply scrambles a string, and can put it back together. It follows a simple cipher. I am using code like this:
String oldstr = "Hello"
String newstr = old.replace("e", "l").replace("l", "t");
I only put a tiny bit because if I wrote out the entire thing, it would be huge.
On to the problem. The way the program works, it first replaces the "e" in "Hello" with an "l", turning the string into "Hlllo". Then it replaces the "l"s with "t"s. However, I don't want the "e" to eventually become an "t", since then I cant turn it back into the original. The way I want this application to work, the outcome would be, "Hltto". Is there a way that I can synchronize the application so that it will do this?
EDIT:
I am not looking for answers that only work for this scenario (in my actual application I have 26 characters being changed).
A cipher should be reversible.
That means, all operations must be reversible. A replace operation is not because it maps two different characters to one. (yours maps both 'e' and 'l' to 'l')
You'd have to build a table of what each character becomes, and change each character based on that table. For example:
a → ...
b → ...
c → ...
d → ...
e → l
...
l → t
...
Be sure that there are no duplicates on the right side.
Then iterate over each character in the string and build a new string.
StringBuilder sb = new StringBuilder(old.length());
for (char c : oldStr.toCharArray())
sb.append( replaceChar(c) );
newStr = db.toString().
The function replaceChar(c) can be as simple as a look-up table, or for more coding convenience, can be a 26-entry switch statement, or (as in real cryptography) a key-dependent mathematical function.
Also, be aware of what should happen between upper and lower case letters and what happens to other characters (like 'à', 'ÿ', '気', ...)
Consider processing a string char-by-char using a Map for replacements:
// replacement map
static final Map<Character, Character> REPLACEMENTS = new HashMap<>();
// fill up replacements
static {
for (String r : "el,lt,te,ab,ba".split(",")) // add all replacement pairs you need
REPLACEMENTS.put(r.charAt(0), r.charAt(1)); // e→l, l→t, t→e, a→b, ...
}
// encoding method
static public String encode(String input) {
StringBuilder sb = new StringBuilder();
for (char c : input.toCharArray())
sb.append(REPLACEMENTS.getOrDefault(c, c));
return sb.toString();
}
To check how it's working, call:
System.out.println(encode("Hello")); // Hltto
One way It can be accomplished is to have a Bitmap/bitstring/array of binary values. Call it whatever you want. So that each bit represent a letter. Initially before ciphering all bits are 0. Then when you change some letter you go to the bit array and flip corresponding bit. This way you know that it was changed so next pass will change only letters that correspond to 0 in bit array.
Or you can do all your ciphering in just one pass.
for (int i = 0; i < oldstring.length; i++) {
switch (oldstring.charAt(i)) {
.....
.
.
.
.....
default: do nothing.
}
}
Related
I have a somewhat unusual problem.I am currently trying to program a chat filter for discord in Java 16.
Here I ran into the problem that in German there are several ways to write a word to get around this filter.
As an example I now take the insult "Hurensohn".
Now you could simply write "Huränsohn" or "Hur3nsohn" in the chat and thus bypass the filter quite easily.
Since I don't want to manually pack every possibility into the filter, I thought about how I could do it automatically.So the first thing I did was to create a hashmap with all possible alternativeven letters, which looked something like this:
Map<String, List<String>> alternativeCharacters = new HashMap<>();
alternativeCharacters.put( "E", List.of( "ä", "3" ) );
I tried to change the corresponding letters in the words and add them to the chat filter, which actually worked.
But now we come to the problem:
To be able to cover all possible combinations, it doesn't do me much good to change only one type of letter in a word.
If we now take the word "Einschalter" and change the letter "e" here, we could also simply change the "e" here with a "3" or with an "ä", whereby then the following would come out:
3einschal3r
Einschalt3r
3inschalter
and
Äinschalär
Einschaltär
Äinschalter
But now I also want "mixed" words to be created. e.g. "3inschalär", where both the "Ä" and the "3" are used to create a word. Where then the following combinations would come out:
3inschalär
Äinschalt3r
Does anyone know how I can relaize something like that? With the normal replace() method I haven't found a way yet to create "mixed" replaces.
I hope people understand what kind of problem I have and what I want to do. :D
Current method used for replacing:
public static List<String> replace( String word, String from, String... to ) {
final int[] index = { 0 };
List<String> strings = new ArrayList<>();
/* Replaces all letters */
List.of( to ).forEach( value -> strings.add( word.replaceAll( from, value ) ) );
/* Here is the problem. Here only one letter is edited at a time and thus changed in the word */
List.of( to ).forEach( value -> {
List.of( word.split( "" ) ).forEach( letters -> {
if ( letters.equalsIgnoreCase( from ) ) {
strings.add( word.substring( 0, index[0] ) + value + "" + word.substring( index[0] + 1 ) );
}
index[0]++;
} );
index[0] = 0;
} );
return strings;
}
As said by others, you can’t keep up with the creativity of people. But if you want to continue using such a check, you should use the right tool for the job, i.e. a RuleBasedCollator.
RuleBasedCollator c = new RuleBasedCollator("<i,I=1=!<e=ä,E=3=Ä<o=0,O");
c.setStrength(Collator.PRIMARY);
String a = "3inschaltär", b = "Einschalter";
if(c.compare(a, b) == 0) {
System.out.println(a + " matches " + b);
}
3inschaltär matches Einschalter
This class even allows efficient hash lookups
// using c from above
// prepare map
var map = new HashMap<CollationKey, String>();
for(String s: List.of("Einschalter", "Hicks-Boson")) {
map.put(c.getCollationKey(s), s);
}
// use map for lookup
for(String s: List.of("Ä!nschalt3r", "H1cks-B0sOn")) {
System.out.println(s);
String match = map.get(c.getCollationKey(s));
if(match != null) System.out.println("\ta variant of " + match);
}
Ä!nschalt3r
a variant of Einschalter
H1cks-B0sOn
a variant of Hicks-Boson
While a Collator can be used for sorting, you’re only interested in identifying equals strings. Therefore, I didn’t care to specify a useful order, which simplifies the rules, as we only need to specify the characters supposed to be equal.
The linked documentation explains the syntax; in short, I=1=! defines the character I, 1, and ! as equal, whereas prepending i, defines i to be a different case of the other characters. Likewise, e=ä,E=3=Ä defines e equal to ä and both being different case than the characters E, 3, Ä. Eventually, the < separator defines characters to be different. It’s also defining a sorting order which, as said, we don’t care about in this usage.
As an addendum, the following can be used to remove accents and other marking from characters, except for umlauts, as you want to match German words. This would remove the requirement to deal with an exploding number of obfuscated character combinations, especially from people who know about Zalgo text converters:
String s = "òñę ảëîöū";
String n = Normalizer.normalize(s, Normalizer.Form.NFD)
.replaceAll("(?!(?<=[aou])\u0308)\\p{Mn}", "");
System.out.println(s + " -> " + n);
òñę ảëîöū -> one aeiöu
Off the top of my head, you may try to approach this using regular expressions, compiling patterns by replacing the respective letters where multiple ways of writing may occur in your dictionary.
E.g. in the direction of:
record LetterReplacements(String letter, List<String> replacements){}
public Predicate<String> generatePredicateForDictionaryWord(String word){
var letterA = new LetterReplacements("a", List.of("a", "A", "4"));
var writingStyles = letterA.replacements.stream()
.collect(Collectors.joining("|", "(", ")"));
var pattern = word.replaceAll(letterA.letter, writingStyles);
return Pattern.compile(pattern).asPredicate();
}
Example usage:
#ParameterizedTest
#CsvSource({
"maus,true",
"m4us,true",
"mAus,true",
"mous,false"
})
void testDictionaryPredicates(String word, boolean expectedResult) {
var predicate = underTest.generatePredicateForDictionaryWord("maus");
assertThat(predicate.test(word)).isEqualTo(expectedResult);
}
However I doubt that any approach in this direction would yield sufficient results in terms of performance, especially since I expect your dictionary to grow rather fast and the number of different writing "styles" to be rather large.
So please regard the snippet above only as explanation for the approach I was talking about. Again, I doubt you would yield sufficient performance, even if precompiling all patterns and the predicate combinations beforehand.
I want to replace some strings in a String input :
string=string.replace("<h1>","<big><big><big><b>");
string=string.replace("</h1>","</b></big></big></big>");
string=string.replace("<h2>","<big><big>");
string=string.replace("</h2>","</big></big>");
string=string.replace("<h3>","<big>");
string=string.replace("</h3>","</big>");
string=string.replace("<h4>","<b>");
string=string.replace("</h4>","</b>");
string=string.replace("<h5>","<small><b>");
string=string.replace("</h5>","</b><small>");
string=string.replace("<h6>","<small>");
string=string.replace("</h6>","</small>");
As you can see this approach is not the best, because each time I have to search for the portion to replace etc, and Strings are immutable... Also the input is large, which means that some performance issues are to be considered.
Is there any better approach to reduce the complexity of this code ?
Although StringBuilder.replace() is a huge improvement compared to String.replace(), it is still very far from being optimal.
The problem with StringBuilder.replace() is that if the replacement has different length than the replaceable part (applies to our case), a bigger internal char array might have to be allocated, and the content has to be copied, and then the replace will occur (which also involves copying).
Imagine this: You have a text with 10.000 characters. If you want to replace the "XY" substring found at position 1 (2nd character) to "ABC", the implementation has to reallocate a char buffer which is at least larger by 1, has to copy the old content to the new array, and it has to copy 9.997 characters (starting at position 3) to the right by 1 to fit "ABC" into the place of "XY", and finally characters of "ABC" are copied to the starter position 1. This has to be done for every replace! This is slow.
Faster Solution: Building Output On-The-Fly
We can build the output on-the-fly: parts that don't contain replaceable texts can simply be appended to the output, and if we find a replaceable fragment, we append the replacement instead of it. Theoretically it's enough to loop over the input only once to generate the output. Sounds simple, and it's not that hard to implement it.
Implementation:
We will use a Map preloaded with mappings of the replaceable-replacement strings:
Map<String, String> map = new HashMap<>();
map.put("<h1>", "<big><big><big><b>");
map.put("</h1>", "</b></big></big></big>");
map.put("<h2>", "<big><big>");
map.put("</h2>", "</big></big>");
map.put("<h3>", "<big>");
map.put("</h3>", "</big>");
map.put("<h4>", "<b>");
map.put("</h4>", "</b>");
map.put("<h5>", "<small><b>");
map.put("</h5>", "</b></small>");
map.put("<h6>", "<small>");
map.put("</h6>", "</small>");
And using this, here is the replacer code: (more explanation after the code)
public static String replaceTags(String src, Map<String, String> map) {
StringBuilder sb = new StringBuilder(src.length() + src.length() / 2);
for (int pos = 0;;) {
int ltIdx = src.indexOf('<', pos);
if (ltIdx < 0) {
// No more '<', we're done:
sb.append(src, pos, src.length());
return sb.toString();
}
sb.append(src, pos, ltIdx); // Copy chars before '<'
// Check if our hit is replaceable:
boolean mismatch = true;
for (Entry<String, String> e : map.entrySet()) {
String key = e.getKey();
if (src.regionMatches(ltIdx, key, 0, key.length())) {
// Match, append the replacement:
sb.append(e.getValue());
pos = ltIdx + key.length();
mismatch = false;
break;
}
}
if (mismatch) {
sb.append('<');
pos = ltIdx + 1;
}
}
}
Testing it:
String in = "Yo<h1>TITLE</h1><h3>Hi!</h3>Nice day.<h6>Hi back!</h6>End";
System.out.println(in);
System.out.println(replaceTags(in, map));
Output: (wrapped to avoid scroll bar)
Yo<h1>TITLE</h1><h3>Hi!</h3>Nice day.<h6>Hi back!</h6>End
Yo<big><big><big><b>TITLE</b></big></big></big><big>Hi!</big>Nice day.
<small>Hi back!</small>End
This solution is faster than using regular expressions as that involves much overhead, like compiling a Pattern, creating a Matcher etc. and regexp is also much more general. It also creates many temporary objects under the hood which are thrown away after the replace. Here I only use a StringBuilder (plus char array under its hood) and the code iterates over the input String only once. Also this solution is much faster that using StringBuilder.replace() as detailed at the top of this answer.
Notes and Explanation
I initialized the StringBuilder in the replaceTags() method like this:
StringBuilder sb = new StringBuilder(src.length() + src.length() / 2);
So basically I created it with an initial capacity of 150% of the length of the original String. This is because our replacements are longer than the replaceable texts, so if replacing occurs, the output will obviously be longer than the input. Giving a larger initial capacity to StringBuilder will result in no internal char[] reallocation at all (of course the required initial capacity depends on the replaceable-replacement pairs and their frequency/occurrence in the input, but this +50% is a good upper estimation).
I also utilized the fact that all replaceable strings start with a '<' character, so finding the next potential replaceable position becomes blazing-fast:
int ltIdx = src.indexOf('<', pos);
It's just a simple loop and char comparisons inside String, and since it always starts searching from pos (and not from the start of the input), overall the code iterates over the input String only once.
And finally to tell if a replaceable String does occur at the potential position, we use the String.regionMatches() method to check the replaceable stings which is also blazing-fast as all it does is just compares char values in a loop and returns at the very first mismatching character.
And a PLUS:
The question doesn't mention it, but our input is an HTML document. HTML tags are case-insensitive which means the input might contain <H1> instead of <h1>.
To this algorithm this is not a problem. The regionMatches() in the String class has an overload which supports case-insensitive comparison:
boolean regionMatches(boolean ignoreCase, int toffset, String other,
int ooffset, int len);
So if we want to modify our algorithm to also find and replace input tags which are the same but are written using different letter case, all we have to modify is this one line:
if (src.regionMatches(true, ltIdx, key, 0, key.length())) {
Using this modified code, replaceable tags become case-insensitive:
Yo<H1>TITLE</H1><h3>Hi!</h3>Nice day.<H6>Hi back!</H6>End
Yo<big><big><big><b>TITLE</b></big></big></big><big>Hi!</big>Nice day.
<small>Hi back!</small>End
For performance - use StringBuilder.
For convenience you can use Map to store values and replacements.
Map<String, String> map = new HashMap<>();
map.put("<h1>","<big><big><big><b>");
map.put("</h1>","</b></big></big></big>");
map.put("<h2>","<big><big>");
...
StringBuilder builder = new StringBuilder(yourString);
for (String key : map.keySet()) {
replaceAll(builder, key, map.get(key));
}
... To replace all occurences in StringBuilder you can check here:
Replace all occurrences of a String using StringBuilder?
public static void replaceAll(StringBuilder builder, String from, String to)
{
int index = builder.indexOf(from);
while (index != -1)
{
builder.replace(index, index + from.length(), to);
index += to.length(); // Move to the end of the replacement
index = builder.indexOf(from, index);
}
}
Unfortunately StringBuilder doesn't provide a replace(string,string) method, so you might want to consider using Pattern and Matcher in conjunction with StringBuffer:
String input = ...;
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("</?(h1|h2|...)>");
Matcher m = p.matcher( input );
while( m.find() )
{
String match = m.group();
String replacement = ...; //get replacement for match, e.g. by lookup in a map
m.appendReplacement( sb, replacement );
}
m.appendTail( sb );
You could do something similar with StringBuilder but in that case you'd have to implement appendReplacement etc. yourself.
As for the expression you could also just try and match any html tag (although that might cause problems since regex and arbitrary html don't fit very well) and when the lookup doesn't have any result you just replace the match with itself.
The particular example you provide seems to be HTML or XHTML. Trying to edit HTML or XML using regular expressions is frought with problems. For the kind of editing you seem to be interested in doing you should look at using XSLT. Another possibility is to use SAX, the streaming XML parser, and have your back-end write the edited output on the fly. If the text is actually HTML, you might be better using a tolerant HTML parser, such as JSoup, to build a parsed representation of the document (like the DOM), and manipulate that before outputting it.
StringBuilder is backed by a char array. So, unlike String instances, it is mutable. Thus, you can call indexOf() and replace() on the StringBuilder.
I would do something like this
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if (tagEquals(str, i, "h1")) {
sb.append("<big><big><big><b>");
i += 2;
} else (tagEquals(s, i, "/h1")) {
...
} else {
sb.append(str.charAt(i));
}
}
tagEquals is a func which checks a tag name
Use Apache Commons StringUtils.replaceEach.
String[] searches = new String[]{"<h1>", "</h1>", "<h2>", ...};
String[] replacements = new String[]("<big><big><big><b>", "</b></big></big></big>", "<big><big>" ...};
string = StringUtils.replaceEach(string, searches, replacements);
I'm using this for loop to go through my array's individual characters from a string.
for(int i=0;i< array;i++){
I need to print the characters one by one using g.drawString. Therefor I need the character at position [i] in the array to be turned into a string. How do I do this?
You can use :
String.valueOf(yourChar)
So your loop would be:
for(int i=0;i< array;i++){
g.drawString(String.valueOf(yourString.charAt(i));
}
It's just simple: "" + array.charAt(i)
Just do this:
char[] array;
g.drawString(String.valueOf(array[i]));
This is the best practice way:
char[] array;
for (char c : array) {
String s = String.valueOf(c);
// do something with s
}
Something like this
char[] chars = {'a','b','c'};
String[] strings = new String[chars.length];
for (int i = 0; i < chars.length; i++) {
strings[i] = String.valueOf(chars[i]);
}
Usually, the best way to do this is if your source is a String:
str.substring(i, i+1); // If you have a string.
because it avoids unnecessary character buffer copies. Some versions of Java (apparently JDK7u5 and earlier) can then reuse the existing character buffer and this way avoid an extra object creation.
(See: this announcement of the change, indicating that this holds for JDK7u5 and before)
There are two quite obvious alternatives (this one also works if you have your own char[] as data source):
String.valueOf(str.charAt(i)); // If you have a String
String.valueOf(chararray[i]); // If you have a char[]
These will actually create two objects: one String and one char[1].
And then there is the ugly hackling:
"" + str.charAt(i); // If you do do not care about doing things right.
which usually causes the creation of a string buffer, with a single append operation, and the conversion to a string. If you use this code in a hot loop, it may really hurt your applications performance. While the code looks really simple, it supposedly translates to:
StringBuilder temp = new StringBuiler("");
temp.append(str.chatAt(i));
temp.toString();
And this overhead is actually quite useless, given that there are two clean solutions available in the API: converting characters to Strings, and constructing substrings.
I am using this code
Matcher m2 = Pattern.compile("\\b[ABE]+\\b").matcher(key);
to only get keys from a HashMap that contain the letters A, B or E
I am not though interested in words such as AAAAAA or EEEEE I need words with at least two different letters (in the best case, three).
Is there a way to modify the regex ? Can anyone offer insight on this?
Replace everything except your letters, make a Set of the result, test the Set for size.
public static void main (String args[])
{
String alphabet = "ABC";
String totest = "BBA";
if (args.length == 2)
{
alphabet = args[0];
totest = args[1];
}
String cleared = totest.replaceAll ("[^" + alphabet + "]", "");
char[] ca = cleared.toCharArray ();
Set <Character> unique = new HashSet <Character> ();
for (char c: ca)
unique.add (c);
System.out.println ("Result: " + (unique.size () > 1));
}
Example implementation
You could use a more complicated regex to do it e.g.
(.*A.*[BE].*|.*[BE].*A.*)|(.*B.*[AE].*|.*[AE].*B.*)|(.*E.*[BA].*|.*[BA].*E.*)
But it's probably going to be more easy to understand to do some kind of replacement, for instance make a loop that replaces one letter at a time with '', and check the size of the new string each time - if it changes the size of the string twice, then you've got two of your desired characters. EDIT: actually, if you know the set of desired characters at runtime before you do the check, NullUserException had it right in his comment - indexOf or contains will be more efficient and probably more readable than this.
Note that if your set of desired characters is unknown at compile time (or at least pre-string-checking at runtime), the second option is preferable - if you're looking for any characters, just replace all occurrences of the first character in a while(str.length > 0) loop - the number of times it goes through the loop is the number of different characters you've got.
Mark explicitly the repetition of desired letters,
It would look like this :
\b[ABE]{1,3}\b
It matches AAE, EEE, AEE but not AAAA, AAEE
I am writing a java application; but stuck on this point.
Basically I have a string of Chinese characters with ALSO some possible Latin chars or numbers, lets say:
查詢促進民間參與公共建設法(210BOT法).
I want to split those Chinese chars except the Latin or numbers as "BOT" above. So, at the end I will have this kind of list:
[ 查, 詢, 促, 進, 民, 間, 參, 與, 公, 共, 建, 設, 法, (, 210, BOT, 法, ), ., ]
How can I resolve this problem (for java)?
Chinese characters lies within certain Unicode ranges:
2F00-2FDF: Kangxi
4E00-9FAF: CJK
3400-4DBF: CJK Extension
So all you basically need to do is to check if the character's codepoint lies within the known ranges. This example is a good starting point to write a stackbased parser/splitter, you only need to extend it to separate digits from latin letters, which should be obvious enough (hint: Character#isDigit()):
Set<UnicodeBlock> chineseUnicodeBlocks = new HashSet<UnicodeBlock>() {{
add(UnicodeBlock.CJK_COMPATIBILITY);
add(UnicodeBlock.CJK_COMPATIBILITY_FORMS);
add(UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS);
add(UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT);
add(UnicodeBlock.CJK_RADICALS_SUPPLEMENT);
add(UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION);
add(UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS);
add(UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A);
add(UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B);
add(UnicodeBlock.KANGXI_RADICALS);
add(UnicodeBlock.IDEOGRAPHIC_DESCRIPTION_CHARACTERS);
}};
String mixedChinese = "查詢促進民間參與公共建設法(210BOT法)";
for (char c : mixedChinese.toCharArray()) {
if (chineseUnicodeBlocks.contains(UnicodeBlock.of(c))) {
System.out.println(c + " is chinese");
} else {
System.out.println(c + " is not chinese");
}
}
Good luck.
Diclaimer: I'm a complete Lucene newbie.
Using the latest version of Lucene (3.6.0 at the time of writing) I manage to get close to the result you require.
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36, Collections.emptySet());
List<String> words = new ArrayList<String>();
TokenStream tokenStream = analyzer.tokenStream("content", new StringReader(original));
CharTermAttribute termAttribute = tokenStream.addAttribute(CharTermAttribute.class);
try {
tokenStream.reset(); // Resets this stream to the beginning. (Required)
while (tokenStream.incrementToken()) {
words.add(termAttribute.toString());
}
tokenStream.end(); // Perform end-of-stream operations, e.g. set the final offset.
}
finally {
tokenStream.close(); // Release resources associated with this stream.
}
The result I get is:
[查, 詢, 促, 進, 民, 間, 參, 與, 公, 共, 建, 設, 法, 210bot, 法]
Here's an approach I would take.
You can use Character.codePointAt(char[] charArray, int index) to return the Unicode value for a char in your char array.
You will also need a mapping of Latin Unicode characters.
If you look in the source of Character.UnicodeBlock, the full LATIN block is the interval [0x0000, 0x0249]. So basically you check if your Unicode code point is somewhere within that interval.
I suspect there is a way to just use a Character.Subset to check if it contains your char, but I haven't looked into that.