How many bytes does a string contains?

How many bytes does a string contains? - java

public class ClassToTestSnippets {
private static ClassToTestSnippets ctts;
public static void main(String[] args) {
ctts = new ClassToTestSnippets();
ctts.testThisMethod();
}
public void testThisMethod() {
System.out.println("\u2014".length()); //answer is 1
}
}
Above code prints 1. But \u2014 is E2 80 94 i.e. 3 bytes. How do I know how many bytes does a string contains?

Depends. What encoding do you want to use?
System.out.println("äö".getBytes("UTF-8").length);
Prints 4, but if I change UTF-8 to ISO-8859-1 (for example), it'll print 2. Other encodings may print other values (try UTF-32).

Internally - it contains (number of chars) * 2 bytes, as each char in Java takes up two bytes (a normal character in Java is 16 bits unicode). The actual bytes are 0x20 and 0x14.
However, the length function returns the number of characters, not the number of bytes.

Related

IOStream writing data to console

I have read that only lower order 8 bits are used while Output of byte output stream, then why I am getting 5?
Also, why I am not getting the binary or hex format of 65?
If I delete the leading 2 zeros and make the value of b as 65 then I get 'A' as the answer but why by placing leading 2 zeros I am not getting the answer but '5'?
Also why I am getting the answer as a character and not in binary format as 'out' is a Byte OutputStream object and should write in bytes?
public static void main(String[] args) {
int b = 0065;
System.out.write(b);
System.out.flush();
}
desired 'A', actual 5?
Also, desired 0100 0001.

Static field out in class java.lang.System has type java.io.PrintStream.
Class PrintStream has several write() methods. In your code, the argument you are passing to method write() is an int, hence the method invokde is write(int). You are assigning a number literal to your local variable b. In java, a number literal that begins with a zero (0) indicates an octal number and 65 in octal is 53 (fifty-three) in decimal and 53 is the ASCII code for the digit 5 (five). For your information, class java.lang.Integer has static method toBinaryString(). I suggest you look at the javadoc for that method.

public class StackOverFlow{
public static void main(String []args){
int x = 0065;// in java when you append zero like 011 or 0023 its take as octal number,when you print it will convert to decimal
System.out.println(x); // 0065 is octal value when you convert to decimal it will be 53 and in hexa 35
int y = 056;//octal value
System.out.println(y); // Output:46 decimal value
}
}

Because 0065 is octal for decimal 53 for hexadecimal 0x35 for the ASCII character 5.

Decode encoded text with ASCII

I'm using below method to encode given text.
static long encodeText(String text) {
long l = 31;
for (int i = 0; i < text.length(); i++) {
l = l * 47 + text.getBytes()[i] % 97;
}
return l;
}
When i call above method as encodeText("stackoverflow"), return the encoded text 3818417496786881978.
Now i want to provide encoded text and get String value. For example, if i give 3818417496786881978 to decodeText(long encoded), i need to get output as stackoverflow.
static String decodeText(long encoded) {
String str = null;
// decode steps here
return str;
}
How can i do this ?

Think this through logically: the clear text "stackoverflow" when represented as 7-bit ASCII represents 13 times 7 bits (=91 bits) of information. Thats more than a long (64 bits) can hold. So your encoding will lose information, making decoding impossible.
That should be also quite apparent from the formula you use:
l = l * 47 + text.getBytes()[i] % 97;
For each charcter you get a number between 0 and 96 (you already loosing information in the modulo, reducing information to 97 possible characters (e.g. you cannot distinguish between the bytes 1 and 98 after the modulo any more). Then you multiply your long by a number less than 97 (47), so two consecutive characters will overlap in terms of information distribution in the cyphertext.
Finally, after adding more an more characters, the long simply overflows and the topmost bits are simply lost.
In conclusion: If you want to decode the cyphertext ever again, fix the loss of information in these three places.

Can I multiply charAt in Java?

When I try to multiply charAt I received "big" number:
String s = "25999993654";
System.out.println(s.charAt(0)+s.charAt(1));
Result : 103
But when I want to receive only one number it's OK .
On the JAVA documentation:
the character at the specified index of this string. The first character is at index 0.
So I need explanation or solution (I think that I should convert string to int , but it seems to me that is unnesessary work)

char is an integral type. The value of s.charAt(0) in your example is the char version of the number 50 (the character code for '2'). s.charAt(1) is (char)53. When you use + on them, they're converted to ints, and you end up with 103 (not 100).
If you're trying to use the numbers 2 and 5, yes, you'll have to parse them. Or if you know they're standard ASCII-style digits (character codes 48 through 57, inclusive), you can just subtract 48 from them (as 48 is the character code for '0'). Or better yet, as Peter Lawrey points out elsewhere, use Character.getNumericValue, which handles a broader range of characters.

Yes - you should parse extracted digit or use ASCII chart feature and substract 48:
public final class Test {
public static void main(String[] a) {
String s = "25999993654";
System.out.println(intAt(s, 0) + intAt(s, 1));
}
public static int intAt(String s, int index) {
return Integer.parseInt(""+s.charAt(index));
//or
//return (int) s.charAt(index) - 48;
}
}

How to compute "binary values addition of characters" for checksum

Working in Java, here is the specification i have in order to implement a checksum calculation on character messages:
8.3.3 Checksum—The checksum permits the receiver to detect a defective frame. The checksum is encoded as two characters which are sent after the <ETB> or <ETX> character. The checksum is computed by adding the binary values of the characters, keeping the least significant eight bits of the result.
8.3.3.1 The checksum is initialized to zero with the <STX> character. The first character used in computing the checksum is the frame number. Each character in the message text is added to the checksum (modulo 256). The computation for the checksum does not include <STX>, the checksum characters, or the trailing <CR> and <LF>.
8.3.3.2 The checksum is an integer represented by eight bits, it can be considered as two groups of four bits. The groups of four bits are converted to the ASCII characters of the hexadecimal representation. The two ASCII characters are transmitted as the checksum, with the most significant character first.
8.3.3.3 For example, a checksum of 122 can be represented as 01111010 in binary or 7A in hexadecimal. The checksum is transmitted as the ASCII character 7 followed by the character A.
Here is what i have understand and implemented, but it doesn't seem to be working... :
private void computeAndAddChecksum(byte[] bytes, OutputStream outputStream) {
logBytesAsBinary(bytes);
long checksum = 0;
for (int i = 0; i < bytes.length; i++) {
checksum += (bytes[i] & 0xffffffffL);
}
int integerChecksum = (int)checksum;
String hexChecksum = Integer.toHexString(integerChecksum).toUpperCase();
logger.info("Checksum for "+new String(bytes)+" is "+checksum+" in hexa: "+hexChecksum);
try {
if (outputStream != null)
{
outputStream.write(hexChecksum.getBytes());
}
} catch (IOException e) {
logger.error(e.getMessage());
}
}
Do you any idea why this snippet is not adapted to the specification ?
Here is an example i was given if it could help:
<STX>3L|1<CR><ETX>3C<CR><LF>
so the checksum of
3L|1<CR><ETX>
should be
3C
Thank you very much for you help.

Your specification says:
the checksum should be initialized with frame number.
Here is a snippet that returns the expected result, but i don't know where frame number comes from (surey elsewhere in your spec)
public class ChecksumBuilder {
public static String getFrameCheckSum(String frametext,int framenum)
{
byte[] a=frametext.getBytes();
int checksum=framenum;
for(int i=0;i<a.length;i++)
{
checksum+=a[i];
}
String out=String.format("%02x",(checksum & 0xFF)).toUpperCase();
return out;
}
public static void main(String[] args)
{
System.out.print(ChecksumBuilder.getFrameCheckSum("3L|1<CR>",1));
}
}

Converting integers into bytes

How do the following numbers, on byte conversion give the results on right hand side ? I guess when you convert an integer to a byte array, it should convert each of the digit of that number into its correponding 4 byte array. But here's what cannot understand..
727 = 000002D7
1944 = 00000798
42 = 0000002A
EDIT: I was reading a blog where I found these following lines:-
If we are working with integer column names, for example, then each column name is 4 bytes long. Lets work with column names 727, 1944 and 42.
The bytes associated with these three numbers:
727 = 000002D7
1944 = 00000798
42 = 0000002A
link to this blog: http://www.divconq.com/2010/why-does-cassandra-have-data-types/

Solution
The following will give you the exact output as in your example:
public class Main
{
public static void main(final String[] args)
{
System.out.format("%08X\n", 727);
System.out.format("%08X\n", 1944);
System.out.format("%08X\n", 42);
}
}
and here is the expected output:
000002D7
00000798
0000002A
Explanation
How the Formatter works, the format from right to left string says, x = format as hexadecimal, 08 = pad to the left eight characters with 0 and the % marks the beginning of the pattern.
You can also use String.format("%08X", 727); to accomplish the same thing.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How many bytes does a string contains? - java

Depends. What encoding do you want to use? System.out.println("äö".getBytes("UTF-8").length); Prints 4, but if I change UTF-8 to ISO-8859-1 (for example), it'll print 2. Other encodings may print other values (try UTF-32).

Internally - it contains (number of chars) * 2 bytes, as each char in Java takes up two bytes (a normal character in Java is 16 bits unicode). The actual bytes are 0x20 and 0x14. However, the length function returns the number of characters, not the number of bytes.

Related

IOStream writing data to console

Decode encoded text with ASCII

Can I multiply charAt in Java?

How to compute "binary values addition of characters" for checksum

Converting integers into bytes

Categories

Resources