Converting a String to small caps pseudoalphabet in Java - java

I found a website which can convert any text to different obscure unicode font styles, e.g. Small Caps pseudoalphabet.
I'm interested in doing the same thing in Java code. The following HxD screenshot shows the bytes of both text versions:
Is there any way to do the conversion in Java with built-in methods or a library? Preferrably the result will be another String object.

Quoting the website you linked:
What makes an alphabet "psuedo"?
One or more of the letters transliterated has a different meaning or source than intended. In the non-bold version of Fraktur, for
example, several letters are "black letter" but most are "mathematical
fraktur". In the Faux Cyrillic and Faux Ethiopic, letters are selected
merely based on superficial similarities, rather than phonetic or
semantic similarities.
So there is no well-defined smallcaps transformation; rather, the author of the converter hand-picked codepoint mappings to give the desired effect.
In the case of small caps, this is probably because there is no small-caps version of x in unicode.
In order to recreate the same effect, you'll have to implement a codepoint conversion lookup table (which you could generate by, e.g., passing the whole alphabet to the transformer)

The Unicode specification has an official, stable name for each and every codepoint. You can take advantage of this by looking up “LATIN LETTER SMALL CAPITAL c” using the method Character.codePointOf(String).
public static String translate(String s) {
int len = s.length();
Formatter smallCaps = new Formatter(new StringBuilder(len));
for (int i = 0; i < len; i++) {
char c = s.charAt(i);
if (c >= 'A' && c <= 'Z' && c != 'X') {
smallCaps.format("%c",
Character.codePointOf("LATIN LETTER SMALL CAPITAL " + c));
} else {
smallCaps.format("%c", c);
}
}
return smallCaps.toString();
}
I put && c != 'X' in the test because there currently is no LATIN LETTER SMALL CAPITAL X character, though it has been proposed.
Note that some small capital codepoints may not be in Java’s internal copy of the Unicode character data table. I found that I needed to use Java 12 or later to recognize them all.

I just found a simple solution by translating the plain text alphabet to the Unicode "small caps" alphabet as follows:
private static final String[] ALPHABET = "abcdefghijklmnopqrstuvwxyz".split("");
private static final String[] SMALL_CAPS_ALPHABET = "ᴀʙᴄᴅᴇꜰɢʜɪᴊᴋʟᴍɴᴏᴩqʀꜱᴛᴜᴠᴡxyᴢ".split("");
private static String toSmallCaps(String text)
{
text = text.toLowerCase();
StringBuilder convertedBuilder = new StringBuilder();
for (char textCharacter : text.toCharArray())
{
int index = 0;
boolean successfullyTranslated = false;
for (String alphabetLetter : ALPHABET)
{
if ((textCharacter + "").equals(alphabetLetter))
{
convertedBuilder.append(SMALL_CAPS_ALPHABET[index]);
successfullyTranslated = true;
break;
}
index++;
}
if (!successfullyTranslated)
{
convertedBuilder.append(textCharacter);
}
}
return convertedBuilder.toString();
}
Usage:
String smallCaps = toSmallCaps("Hello StackOverflow!");
System.out.println(smallCaps);
Output:
ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ!
It's not the most elegant or extendable solution but maybe someone can suggest improvements.

The answer posted by #BullyWiiPlaza is a good one, but the code is pretty inefficient.
Here is an alternative implementation which will be much faster and uses less memory:
private static final char[] SMALL_CAPS_ALPHABET = "ᴀʙᴄᴅᴇꜰɢʜɪᴊᴋʟᴍɴᴏᴩqʀꜱᴛᴜᴠᴡxyᴢ".toCharArray();
private static String toSmallCaps(String text)
{
if(null == text) {
return null;
}
int length = text.length();
StringBuilder smallCaps = new StringBuilder(length);
for(int i=0; i<length; ++i) {
char c = text.charAt(i);
if(c >= 'a' && c <= 'z') {
smallCaps.append(SMALL_CAPS_ALPHABET[c - 'a']);
} else {
smallCaps.append(c);
}
}
return smallCaps.toString();
}

Related

How to check if String contains Latin letters without regex

I want to check if String contains only Latin letters but also can contains numbers and other symbols like: _/+), etc.
String utm_source=google should pass, utm_source=google&2019_and_2020! should pass too. But utm_ресурс=google should not pass (coz cyrillic letters). I know code with regex, but how can i do it without using regex and classic for loop, maybe with Streams and Character class?
Use this code
public static boolean isValidUsAscii (String s) {
return Charset.forName("US-ASCII").newEncoder().canEncode(s);
}
For restricted "latin" (no é etcetera), it must be either US-ASCII (7 bits), or ISO-8859-1 but without accented letters.
boolean isBasicLatin(String s) {
return s.codePoints().allMatch(cp -> cp < 128 || (cp < 256 && !isLetter(cp)));
}
Less of a neat single line approach but really all you need to do is check whether the numeric value of the character is within certain limits like so:
public boolean isQwerty(String text) {
int length = text.length();
for(int i = 0; i < length; i++) {
char character = text.charAt(i);
int ascii = character;
if(ascii<32||ascii>126) {
return false;
}
}
return true;
}
Test Run
ä returns false
abc returns true

Algorithm for IP Address-like String extraction? [Java]

Given a strings like:
S5.4.2
3.2
SD45.G.94L5456.294.1004.8888.0.23QWZZZZ
5.44444444444444444444444444444.5GXV
You would need to return:
5.4.2
3.2
5456.294.1004.8888.0.23
5.44444444444444444444444444444.5
Is there a smart way to write a method to extract only the IP address-like number from it? (I say IP Address-like because I'm not sure if IP addresses are usually a set amount of numbers or not). One of my friends suggested that there might be a regex for what I'm looking for and so I found this. The problem with that solution is I think it is limited to only 4 integers total, and also won't expect ints like with 4+ digits in between dots.
I would need it to recognize the pattern:
NUMS DOT (only one) NUMS DOT (only one) NUMS
Meaning:
234..54.89 FAIL
.54.6.10 FAIL
256 FAIL
234.7885.924.120.2.0.4 PASS
Any ideas? Thanks everyone. I've been at this for a few hours and can't figure this out.
Here is an approach using Regex:
private static String getNumbers(String toTest){
String IPADDRESS_PATTERN =
"(\\d+\\.)+\\d+";
Pattern pattern = Pattern.compile(IPADDRESS_PATTERN);
Matcher matcher = pattern.matcher(toTest);
if (matcher.find()) {
return matcher.group();
}
else{
return "did not match anything..";
}
}
This will match the number-dot-number-... pattern with an infinite amount of numbers.
I modified this Answer a bit.
There are many ways to do it. This is the best way, I guess.
public static String NonDigits(final CharSequence input)
{
final StringBuilder sb = new StringBuilder(input.length());
for(int i = 0; i < input.length(); i++){
final char c = input.charAt(i);
if(c > 47 && c < 58){
sb.append(c);
}
}
return sb.toString();
}
A CharSequence is a readable sequence of characters. This interface
provides uniform, read-only access to many different kinds of
character sequences.
Look at this ASCII Table. The ascii values from 47 to 58 are 0 to 9. So, this is the best possible way to extract the digits than any other way.
"final" keyword is more like if set once, cannot be modified. Declaring string as final is good practise.
Same code, just to understand the code in a better way:
public static String NonDigits(String input)
{
StringBuilder sb = new StringBuilder(input.length());
for(int i = 0; i < input.length(); i++)
{
char c = input.charAt(i);
if(c > 47 && c < 58)
{
sb.append(c);
}
}
return sb.toString();
}

RegEx to find URLs in HTML takes 25 seconds in Java/Android

In Android/Java, given a website's HTML source code, I would like to extract all XML and CSV file paths.
What I am doing (with RegEx) is this:
final HashSet<String> urls = new HashSet<String>();
final Pattern urlRegex = Pattern.compile(
"[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|].(xml|csv)");
final Matcher url = urlRegex.matcher(htmlString);
while (url.find()) {
urls.add(makeAbsoluteURL(url.group(0)));
}
public String makeAbsoluteURL(String url) {
if (url.startsWith("http://") || url.startsWith("http://")) {
return url;
}
else if (url.startsWith("/")) {
return mRootURL+url.substring(1);
}
else {
return mBaseURL+url;
}
}
Unfortunately, this runs for about 25 seconds for an average website with normal length. What is going wrong? Is my RegEx just bad? Or is RegEx just so slow?
Can I find the URLs faster without RegEx?
Edit:
The source for the valid characters was (roughly) this answer. However, I think the two character classes (square brackets) must be swapped so that you have a more limited character set for the first char of the URL and a broader character class for all remaining chars. This was the intention.
Your regex is written in a way that makes it slow for long inputs.
The * operator is greedy.
For instance for input:
http://stackoverflow.com/questions/19019504/regex-to-find-urls-in-html-takes-25-seconds-in-java-android.xml
The [-a-zA-Z0-9+&##/%?=~_|!:,.;]* part of the regex will consume the whole string. It will then try to match the next character group, which will fail (since whole string is consumed). It will then backtrack in match of first part of the regex by one character and try to match the second character group again. It will match. Then it will try to match the dot and fail because the whole string is consumed. Another backtrack etc...
In essence your regex is forcing a lot of backtracking to match anything. It will also waste a lot of time on matches that have no way of succeeding.
For word forest it will first consume whole word in the first part of expression and then repeatedly backtrack after failing to match the rest of expression. Huge waste of time.
Also:
the . in regex is unescaped and it will match ANY character.
url.group(0) is redundant. url.group() has same meaning
In order to speed up the regex you need to figure out a way to reduce the amount of backtracking and it would also help if you had a less general start of the match. Right now every single word will cause matching to start and generally fail. For instance typically in html all the links are inside 2 ". If that's the case you can start your matching at " which will speed it up tremendously. Try to find a better start of the expression.
I've nothing the say in the theoretical overview that U Mad did, he highlighted everything I'd noticed.
What I would like to suggest you, considering what are you look for with the RE, is to change the point of view of your RE :)
You are looking for xml and csv files, so why don't you reverse the html string, for example using:
new StringBuilder("bla bla bla foo letme/find.xml bla bla").reverse().toString()
after that you could look for the pattern:
final Pattern urlRegex = Pattern.compile(
"(vsc|lmx)\\.[-a-zA-Z0-9+&##/%=~_|][-a-zA-Z0-9+&##/%?=~_|!:,.;]*");
urlRegex pattern could be refined as U Mad has already suggested. But in this way you could reduce the number of failed matches.
I had my doubts, if there can be a String really long enough to take 25 seconds for parsing. So I tried and must admit now, that with about 27MB of text, it takes around 25 seconds to parse it with the given regular expression.
Being curious I changed the little test program with #FabioDch's approach (so, please vote for him, if you want to vote anywhere :-)
The result is quite impressing: Instead of 25 Seconds, #FabioDch's approach needed less then 1 second (100ms to 800ms) + 70ms to 85ms for reversing!
Here's the code I used. It reads text from the largest text file I've found and copies it 10 time to get 27MB of text. Then runs the regex against it and prints out the results.
#Test
public final void test() throws IOException {
final Pattern urlRegex = Pattern.compile("(lmx|vsc)\\.[-a-zA-Z0-9+&##/%=~_|][-a-zA-Z0-9+&##/%?=~_|!:,.;]*");
printTimePassed("initialized");
List<String> lines = Files.readAllLines(Paths.get("testdata", "Aster_Express_User_Guide_0500.txt"), Charset.defaultCharset());
StringBuilder sb = new StringBuilder();
for(int i=0; i<10; i++) { // Copy 10 times to get more useful data
for(String line : lines) {
sb.append(line);
sb.append('\n');
}
}
printTimePassed("loaded: " + lines.size() + " lines, in " + sb.length() + " chars");
String html = sb.reverse().toString();
printTimePassed("reversed");
int i = 0;
final Matcher url = urlRegex.matcher(html);
while (url.find()) {
System.out.println(i++ + ": FOUND: " + new StringBuilder(url.group()).reverse() + ", " + url.start() + ", " + url.end());
}
printTimePassed("ready");
}
private void printTimePassed(String msg) {
long current = System.currentTimeMillis();
System.out.printf("%s: took %d ms\n", msg, (current - ms));
ms = current;
}
Would suggest only using the regex to find file extensions (.xml or .csv). This should be a lot faster and when found, you can look backwards, examining each character before and stop when you reach one that couldn't be in a URL - see below:
final HashSet<String> urls = new HashSet<String>();
final Pattern fileExtRegex = Pattern.compile("\\.(xml|csv)");
final Matcher fileExtMatcher = fileExtRegex.matcher(htmlString);
// Find next occurrence of ".xml" or ".csv" in htmlString
while (fileExtMatcher.find()) {
// Go backwards from the character just before the file extension
int dotPos = fileExtMatcher.start() - 1;
int charPos = dotPos;
while (charPos >= 0) {
// Break if current character is not a valid URL character
char chr = htmlString.charAt(charPos);
if (!((chr >= 'a' && chr <= 'z') ||
(chr >= 'A' && chr <= 'Z') ||
(chr >= '0' && chr <= '9') ||
chr == '-' || chr == '+' || chr == '&' || chr == '#' ||
chr == '#' || chr == '/' || chr == '%' || chr == '?' ||
chr == '=' || chr == '~' || chr == '|' || chr == '!' ||
chr == ':' || chr == ',' || chr == '.' || chr == ';')) {
break;
}
charPos--;
}
// Extract/add URL if there are valid URL characters before file extension
if ((dotPos > 0) && (charPos < dotPos)) {
String url = htmlString.substring(charPos + 1, fileExtMatcher.end());
urls.add(makeAbsoluteURL(url));
}
}
Small disclaimer: I used part of your original regex for valid URL characters: [-a-zA-Z0-9+&##/%?=~_|!:,.;]. Haven't verified if this is comprehensive and there are perhaps further improvements that could be made, e.g. it would currently find local file paths (e.g. C:\TEMP\myfile.xml) as well as URLs. Wanted to keep the code above simple to demonstrate the technique so haven't tackled this.
EDIT Following the comment about effiency I've modified to no longer use a regex for checking valid URL characters. Instead, it compares the character against valid ranges manually. Uglier code but should be faster...
I know people love to use regex to parse html, but have you considered using jsoup?
For sake of clarity I created a separate answer for this regex:
Edited to escape the dot and remove reluctant quant.
(?<![-a-zA-Z0-9+&##/%=~_|])[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]‌\\​.(xml|csv)
Please try this one and tell me how it goes.
Also here's a class which will enable you to search a reversed string without actually reversing it:
public class ReversedString implements CharSequence {
public ReversedString(String input) {
this.s = input;
this.len = s.length();
}
private final String s;
private final int len;
#Override
public CharSequence subSequence(final int start, final int end) {
return new CharSequence() {
#Override
public CharSequence subSequence(int start, int end) {
throw new UnsupportedOperationException();
}
#Override
public int length() {
return end-start;
}
#Override
public char charAt(int index) {
return s.charAt(len-start-index-1);
}
#Override
public String toString() {
StringBuilder buf = new StringBuilder(end-start);
for(int i = start;i < end;i++) {
buf.append(s.charAt(len-i-1));
}
return buf.toString();
}
};
}
#Override
public int length() {
return len;
}
#Override
public char charAt(int index) {
return s.charAt(len-1-index);
}
}
You can use this class as such:
pattern.matcher(new ReversedString(inputString));

Efficient way to replace chars in a string (java)?

I'm writing a small JAVA program which:
takes a text as a String
takes 2 arrays of chars
What im trying to do will sound like "find and replace" but it is not the same so i thought its important to clear it.
Anyway I want to take this text, find if any char from the first array match a char in the text and if so, replace it with the matching char (according to index) from the second char array.
I'll explain with an example:
lets say my text (String) is: "java is awesome!";
i have 2 arrays (char[]): "absm" and "!#*$".
The wished result is to change 'a' to '!' , 'b' to '#' and so on..
meaning the resulted text will be:
"java is awesome!" changed to -> "j#v# i* #w*o$e!"
What is the most efficient way of doing this and why?
I thought about looping the text, but then i found it not so efficient.
(StringBuilder/String class can be used)
StringBuilder sb = new StringBuilder(text);
for(int i = 0; i<text.length(); i ++)
{
for (int j = 0; j < firstCharArray.length;j++)
{
if (sb.charAt(i) == firstCharArray[j])
{
sb.setCharAt(i, secondCharArray[j]);
break;
}
}
}
This way is efficient because it uses a StringBuilder to change the characters in place (if you used Strings you would have to create new ones each time because they are immutable.) Also it minimizes the amount of passes you have to do (1 pass through the text string and n passes through the first array where n = text.length())
I guess you are looking for StringUtils.replaceEach, at least as a reference.
How efficient do you need it to be? Are you doing this for hundreds, thousands, millions of words???
I don't know if it's the most efficent, but you could use the string indexOf() method on each of your possible tokens, it will tell you if it's there, and then you can replace that index at the same time with the corresponding char from the other array.
Codewise, something like (this is half pseudo code by the way):
for(each of first array) {
int temp = YourString.indexOf(current array field);
if (temp >=0) {
replace with other array
}
}
Put the 2 arrays you have in a Map
Map<Character, Character> //or Map of Strings
where the key is "a", "b" etc... and the value is the character you want to substitute with - "#" etc....
Then simply replace the keys in your String with the values.
For small stuff like this, an indexOf() search is likely to be faster than a map, while "avoiding" the inner loop of the accepted answer. Of course, the loop is still there, inside String.indexOf(), but it's likely to be optimized to a fare-thee-well by the JIT-compiler, because it's so heavily used.
static String replaceChars(String source, String from, String to)
{
StringBuilder dest = new StringBuilder(source);
for ( int i = 0; i < source.length(); i++ )
{
int foundAt = from.indexOf(source.charAt(i));
if ( foundAt >= 0 )
dest.setCharAt(i,to.charAt(foundAt));
}
return dest.toString();
}
Update: The Oracle/Sun JIT uses SIMD on at least some processors for indexOf(), making it even faster than one would guess.
Since the only way to know if a character should be replaced is to check it, you (or any util method) have to loop through the whole text, character after the other. You can never achieve better complexity than O(n) (n be the number of characters in the text).
This utility class that replaces a char or a group of chars of a String. It is equivalent to bash tr and perl tr///, aka, transliterate.
/**
* Utility class that replaces chars of a String, aka, transliterate.
*
* It's equivalent to bash 'tr' and perl 'tr///'.
*
*/
public class ReplaceChars {
public static String replace(String string, String from, String to) {
return new String(replace(string.toCharArray(), from.toCharArray(), to.toCharArray()));
}
public static char[] replace(char[] chars, char[] from, char[] to) {
char[] output = chars.clone();
for (int i = 0; i < output.length; i++) {
for (int j = 0; j < from.length; j++) {
if (output[i] == from[j]) {
output[i] = to[j];
break;
}
}
}
return output;
}
/**
* For tests!
*/
public static void main(String[] args) {
// Example from: https://en.wikipedia.org/wiki/Caesar_cipher
String string = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG";
String from = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
String to = "XYZABCDEFGHIJKLMNOPQRSTUVW";
System.out.println();
System.out.println("Cesar cypher: " + string);
System.out.println("Result: " + ReplaceChars.replace(string, from, to));
}
}
This is the output:
Cesar cypher: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Result: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

How can I check if a single character appears in a string?

In Java is there a way to check the condition:
"Does this single character appear at all in string x"
without using a loop?
You can use string.indexOf('a').
If the char a is present in string :
it returns the the index of the first occurrence of the character in
the character sequence represented by this object, or -1 if the
character does not occur.
String.contains() which checks if the string contains a specified sequence of char values
String.indexOf() which returns the index within the string of the first occurence of the specified character or substring (there are 4 variations of this method)
I'm not sure what the original poster is asking exactly. Since indexOf(...) and contains(...) both probably use loops internally, perhaps he's looking to see if this is possible at all without a loop? I can think of two ways off hand, one would of course be recurrsion:
public boolean containsChar(String s, char search) {
if (s.length() == 0)
return false;
else
return s.charAt(0) == search || containsChar(s.substring(1), search);
}
The other is far less elegant, but completeness...:
/**
* Works for strings of up to 5 characters
*/
public boolean containsChar(String s, char search) {
if (s.length() > 5) throw IllegalArgumentException();
try {
if (s.charAt(0) == search) return true;
if (s.charAt(1) == search) return true;
if (s.charAt(2) == search) return true;
if (s.charAt(3) == search) return true;
if (s.charAt(4) == search) return true;
} catch (IndexOutOfBoundsException e) {
// this should never happen...
return false;
}
return false;
}
The number of lines grow as you need to support longer and longer strings of course. But there are no loops/recurrsions at all. You can even remove the length check if you're concerned that that length() uses a loop.
You can use 2 methods from the String class.
String.contains() which checks if the string contains a specified sequence of char values
String.indexOf() which returns the index within the string of the first occurence of the specified character or substring or returns -1 if the character is not found (there are 4 variations of this method)
Method 1:
String myString = "foobar";
if (myString.contains("x") {
// Do something.
}
Method 2:
String myString = "foobar";
if (myString.indexOf("x") >= 0 {
// Do something.
}
Links by: Zach Scrivena
String temp = "abcdefghi";
if(temp.indexOf("b")!=-1)
{
System.out.println("there is 'b' in temp string");
}
else
{
System.out.println("there is no 'b' in temp string");
}
If you need to check the same string often you can calculate the character occurrences up-front. This is an implementation that uses a bit array contained into a long array:
public class FastCharacterInStringChecker implements Serializable {
private static final long serialVersionUID = 1L;
private final long[] l = new long[1024]; // 65536 / 64 = 1024
public FastCharacterInStringChecker(final String string) {
for (final char c: string.toCharArray()) {
final int index = c >> 6;
final int value = c - (index << 6);
l[index] |= 1L << value;
}
}
public boolean contains(final char c) {
final int index = c >> 6; // c / 64
final int value = c - (index << 6); // c - (index * 64)
return (l[index] & (1L << value)) != 0;
}}
To check if something does not exist in a string, you at least need to look at each character in a string. So even if you don't explicitly use a loop, it'll have the same efficiency. That being said, you can try using str.contains(""+char).
Is the below what you were looking for?
int index = string.indexOf(character);
return index != -1;
Yes, using the indexOf() method on the string class. See the API documentation for this method
String.contains(String) or String.indexOf(String) - suggested
"abc".contains("Z"); // false - correct
"zzzz".contains("Z"); // false - correct
"Z".contains("Z"); // true - correct
"😀and😀".contains("😀"); // true - correct
"😀and😀".contains("😂"); // false - correct
"😀and😀".indexOf("😀"); // 0 - correct
"😀and😀".indexOf("😂"); // -1 - correct
String.indexOf(int) and carefully considered String.indexOf(char) with char to int widening
"😀and😀".indexOf("😀".charAt(0)); // 0 though incorrect usage has correct output due to portion of correct data
"😀and😀".indexOf("😂".charAt(0)); // 0 -- incorrect usage and ambiguous result
"😀and😀".indexOf("😂".codePointAt(0)); // -1 -- correct usage and correct output
The discussions around character is ambiguous in Java world
can the value of char or Character considered as single character?
No. In the context of unicode characters, char or Character can sometimes be part of a single character and should not be treated as a complete single character logically.
if not, what should be considered as single character (logically)?
Any system supporting character encodings for Unicode characters should consider unicode's codepoint as single character.
So Java should do that very clear & loud rather than exposing too much of internal implementation details to users.
String class is bad at abstraction (though it requires confusingly good amount of understanding of its encapsulations to understand the abstraction 😒😒😒 and hence an anti-pattern).
How is it different from general char usage?
char can be only be mapped to a character in Basic Multilingual Plane.
Only codePoint - int can cover the complete range of Unicode characters.
Why is this difference?
char is internally treated as 16-bit unsigned value and could not represent all the unicode characters using UTF-16 internal representation using only 2-bytes. Sometimes, values in a 16-bit range have to be combined with another 16-bit value to correctly define character.
Without getting too verbose, the usage of indexOf, charAt, length and such methods should be more explicit. Sincerely hoping Java will add new UnicodeString and UnicodeCharacter classes with clearly defined abstractions.
Reason to prefer contains and not indexOf(int)
Practically there are many code flows that treat a logical character as char in java.
In Unicode context, char is not sufficient
Though the indexOf takes in an int, char to int conversion masks this from the user and user might do something like str.indexOf(someotherstr.charAt(0))(unless the user is aware of the exact context)
So, treating everything as CharSequence (aka String) is better
public static void main(String[] args) {
System.out.println("😀and😀".indexOf("😀".charAt(0))); // 0 though incorrect usage has correct output due to portion of correct data
System.out.println("😀and😀".indexOf("😂".charAt(0))); // 0 -- incorrect usage and ambiguous result
System.out.println("😀and😀".indexOf("😂".codePointAt(0))); // -1 -- correct usage and correct output
System.out.println("😀and😀".contains("😀")); // true - correct
System.out.println("😀and😀".contains("😂")); // false - correct
}
Semantics
char can handle most of the practical use cases. Still its better to use codepoints within programming environment for future extensibility.
codepoint should handle nearly all of the technical use cases around encodings.
Still, Grapheme Clusters falls out of the scope of codepoint level of abstraction.
Storage layers can choose char interface if ints are too costly(doubled). Unless storage cost is the only metric, its still better to use codepoint. Also, its better to treat storage as byte and delegate semantics to business logic built around storage.
Semantics can be abstracted at multiple levels. codepoint should become lowest level of interface and other semantics can be built around codepoint in runtime environment.
package com;
public class _index {
public static void main(String[] args) {
String s1="be proud to be an indian";
char ch=s1.charAt(s1.indexOf('e'));
int count = 0;
for(int i=0;i<s1.length();i++) {
if(s1.charAt(i)=='e'){
System.out.println("number of E:=="+ch);
count++;
}
}
System.out.println("Total count of E:=="+count);
}
}
static String removeOccurences(String a, String b)
{
StringBuilder s2 = new StringBuilder(a);
for(int i=0;i<b.length();i++){
char ch = b.charAt(i);
System.out.println(ch+" first index"+a.indexOf(ch));
int lastind = a.lastIndexOf(ch);
for(int k=new String(s2).indexOf(ch);k > 0;k=new String(s2).indexOf(ch)){
if(s2.charAt(k) == ch){
s2.deleteCharAt(k);
System.out.println("val of s2 : "+s2.toString());
}
}
}
System.out.println(s1.toString());
return (s1.toString());
}
you can use this code. It will check the char is present or not. If it is present then the return value is >= 0 otherwise it's -1. Here I am printing alphabets that is not present in the input.
import java.util.Scanner;
public class Test {
public static void letters()
{
System.out.println("Enter input char");
Scanner sc = new Scanner(System.in);
String input = sc.next();
System.out.println("Output : ");
for (char alphabet = 'A'; alphabet <= 'Z'; alphabet++) {
if(input.toUpperCase().indexOf(alphabet) < 0)
System.out.print(alphabet + " ");
}
}
public static void main(String[] args) {
letters();
}
}
//Ouput Example
Enter input char
nandu
Output :
B C E F G H I J K L M O P Q R S T V W X Y Z
If you see the source code of indexOf in JAVA:
public int indexOf(int ch, int fromIndex) {
final int max = value.length;
if (fromIndex < 0) {
fromIndex = 0;
} else if (fromIndex >= max) {
// Note: fromIndex might be near -1>>>1.
return -1;
}
if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
// handle most cases here (ch is a BMP code point or a
// negative value (invalid code point))
final char[] value = this.value;
for (int i = fromIndex; i < max; i++) {
if (value[i] == ch) {
return i;
}
}
return -1;
} else {
return indexOfSupplementary(ch, fromIndex);
}
}
you can see it uses a for loop for finding a character. Note that each indexOf you may use in your code, is equal to one loop.
So, it is unavoidable to use loop for a single character.
However, if you want to find a special string with more different forms, use useful libraries such as util.regex, it deploys stronger algorithm to match a character or a string pattern with Regular Expressions. For example to find an email in a string:
String regex = "^(.+)#(.+)$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(email);
If you don't like to use regex, just use a loop and charAt and try to cover all cases in one loop.
Be careful recursive methods has more overhead than loop, so it's not recommended.
how about one uses this ;
let text = "Hello world, welcome to the universe.";
let result = text.includes("world");
console.log(result) ....// true
the result will be a true or false
this always works for me
You won't be able to check if char appears at all in some string without atleast going over the string once using loop / recursion ( the built-in methods like indexOf also use a loop )
If the no. of times you look up if a char is in string x is more way more than the length of the string than I would recommend using a Set data structure as that would be more efficient than simply using indexOf
String s = "abc";
// Build a set so we can check if character exists in constant time O(1)
Set<Character> set = new HashSet<>();
int len = s.length();
for(int i = 0; i < len; i++) set.add(s.charAt(i));
// Now we can check without the need of a loop
// contains method of set doesn't use a loop unlike string's contains method
set.contains('a') // true
set.contains('z') // false
Using set you will be able to check if character exists in a string in constant time O(1) but you will also use additional memory ( Space complexity will be O(n) ).

Categories

Resources