append data while double looping of record - java

I have data like given below.
11
13
15
17
25
26
29
30
17
25
26
29
30
25
26
29
30
25
26
29
30
17
25
26
29
30
17
19
In this data there are two groups of record (15,17) can occur only once at position only 3rd and 4th only. Second group is (25, 26, 29, 30) and can be occur multiple times.
Record 17 is like a break..after this new group starts and in that group multi subgroups(25,26,29,30) could be present.
I want to append two characters when ever a new group starts, and keep increment the subgroup if doesnt break by breaker record(17).
I hope its cleared.
My output data looks like this
11
13
1115
17
2125
26
29
30
17
3125
26
29
30
n225
26
29
30
n325
26
29
30
17
4125
26
29
30
17
19
tried so far..with this abstract code. But not able to handle subgroups with n character
int line=0
int seq =0
if (in.equal =15 or 25) {
line++
seq++
context.line = line++ // variable to store line variable
context.seq = seq++ // variable to store seq variable
out = context.line+context.seq+in // out is output record and in is input record mentioned in above data
flag = 1
}
else (if in.equal = 17 and flag = 1){
flag = 0
seq=0
}
Any suggestion please ?

Related

How to generate base64 string from Java to C#?

I am trying to convert a Java function in C#. Here is the original code:
class SecureRandomString {
private static SecureRandom random = new SecureRandom();
private static Base64.Encoder encoder = Base64.getUrlEncoder().withoutPadding();
public static String generate(String seed) {
byte[] buffer;
if (seed == null) {
buffer = new byte[20];
random.nextBytes(buffer);
}
else {
buffer = seed.getBytes();
}
return encoder.encodeToString(buffer);
}
}
And here is what I did in C#:
public class Program
{
private static readonly Random random = new Random();
public static string Generate(string seed = null)
{
byte[] buffer;
if (seed == null)
{
buffer = new byte[20];
random.NextBytes(buffer);
}
else
{
buffer = Encoding.UTF8.GetBytes(seed);
}
return System.Web.HttpUtility.UrlPathEncode(RemovePadding(Convert.ToBase64String(buffer)));
}
private static string RemovePadding(string s) => s.TrimEnd('=');
}
I wrote some testcases:
Assert(Generate("a"), "YQ");
Assert(Generate("ab"), "YWI");
Assert(Generate("abc"), "YWJj");
Assert(Generate("abcd"), "YWJjZA");
Assert(Generate("abcd?"), "YWJjZD8");
Assert(Generate("test wewqe_%we()21-3012"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI");
Assert(Generate("test wewqe_%we()21-3012_"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTJf");
Assert(Generate("test wewqe_%we()21-3012/"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTIv");
Assert(Generate("test wewqe_%we()21-3012!"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTIh");
Assert(Generate("test wewqe_%we()21-3012a?"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTJhPw");`
And everything works fine, until I try the following one:
Assert(Generate("test wewqe_%we()21-3012?"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI_");
My code output dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI/ instead of the expected dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI_. Why?
I think that the culprit is the encoder. The original code configure its encoder like this Base64.getUrlEncoder().withoutPadding(). The withoutPadding() is basically a TrimEnd("=") but I am not sure how to code the getUrlEncoder().
I looked into this handy conversion table URL Encoding using C# without finding nothing for my case.
I tried HttpUtility.UrlEncode but the output is not right.
What did I missed?
According to Oracle documentation, here is what getUrlEncoder() does:
Returns a Base64.Encoder that encodes using the URL and Filename safe type base64 encoding scheme.
Alright what is "URL and Filename safe". Once more the documenation is helping:
Uses the "URL and Filename safe Base64 Alphabet" as specified in Table 2 of RFC 4648 for encoding and decoding. The encoder does not add any line feed (line separator) character. The decoder rejects data that contains characters outside the base64 alphabet.
We can now look online for the RFC 4648. Here is the Table 2:
Table 2: The "URL and Filename safe" Base 64 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 - (minus)
12 M 29 d 46 u 63 _
13 N 30 e 47 v (underline)
14 O 31 f 48 w
15 P 32 g 49 x
16 Q 33 h 50 y (pad) =
It is an encoding table. For example given 0 should output A, given 42 should ouput q, etc.
Let's check the decoding table, the Table 1:
Table 1: The Base 64 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y
Note that both table are strictly equals minus two things:
'+' is encoded to '-'
'/' is encoded to '_'
You should be able to fix your problem with:
private static string Encode(string s) => s.Replace("+", "-").Replace("/", "_");

Kotlin String max length? (Kotlin file with a long String is not compiling)

According to this answer Java can hold up to 2^31 - 1 characters. I was trying to do benchmarking and stuffs, so I tried to create a large amount of string and write it to a file like this:
import java.io.*
fun main() {
val out = File("ouput.txt").apply { createNewFile() }.printWriter()
sequence {
var x = 0
while (true) {
yield("${++x} ${++x} ${++x} ${++x} ${++x}")
}
}.take(5000).forEach { out.println(it) }
out.close()
}
And then the output.txt file contains like this:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
// ... 5000 lines
And then I copied all the contents of the file into a string for some benchmarking of some functions, so this is how it looks:
import kotlin.system.*
fun main() {
val inputs = """
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
// ... 5000 lines
24986 24987 24988 24989 24990
24991 24992 24993 24994 24995
24996 24997 24998 24999 25000
""".trimIndent()
measureNanoTime {
inputs.reader().forEachLine { line ->
val (a, b, c, d, e) = line.split(" ").map(String::toInt)
}
}.div(5000).let(::println)
}
The total character count of the file/string is 138894
String can hold up to 2147483647
But the Kotlin code does not compile (last file) It throws compilation error:
e: org.jetbrains.kotlin.codegen.CompilationException: Back-end (JVM) Internal error: wrong bytecode generated
// more lines
The root cause java.lang.IllegalArgumentException was thrown at: org.jetbrains.org.objectweb.asm.ByteVector.putUTF8(ByteVector.java:246)
at org.jetbrains.kotlin.codegen.TransformationMethodVisitor.visitEnd(TransformationMethodVisitor.kt:92)
at org.jetbrains.kotlin.codegen.FunctionCodegen.endVisit(FunctionCodegen.java:971)
... 43 more
Caused by: java.lang.IllegalArgumentException: UTF8 string too large
Here is the total log of exception with stacktrace: https://gist.github.com/Animeshz/1a18a7e99b0c0b913027b7fb36940c07
There is limit in java class file, length of string constant must fit in 16 bits ie. 65535 bytes (not characters) is max length of string in source code.
The class File Format

Java Mapper: finding the number of customers who purchased both item IDs 21 and 27

I would like to create a map function for a retail dataset with input key as long integer offset and input value as a line of text. The output key of the map is the text Both_21_27 and the output value is constant integer value 1.
In the map, two boolean variables item_21 and item_27 should be created and initialized to false. After changing value to string, StringTokenizer is used to have strings into tokens.
With each token, each token has to be iterated to see if it matches with 21 or 27. If there is a match, the corresponding boolean variable is changed to true. The switch condition can be used for checking.
After going through all the tokens, both the boolean variables should be true or not. If both boolean variables are false or one is true and one is false, then else return; statement should be used to skip the transaction and move on to the next.
A sample retail dataset is shown below:
2 7 15 21 32 41
5 14 19 21 25 27 45 57 62 75 80
1 3 7 15 19 21 26 27 35 44 54
2 9 16 24 35 41 49 57 68 72 88
4 23 31 33 42 45 67 73 92
9 12 18 21 22 24 27 43 74
15 19 45 47 53 58 64 79 83 94 99 107
3 7 15 17 21 23 26 27 33 42 44 47 49 55 62 77 82
Here is what I tried so far:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class RetailMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text Both_21_27 = new Text();
private final static IntWritable one = new IntWritable(1);
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
boolean item_21 = false;
boolean item_27 = false;
StringTokenizer item = new StringTokenizer(value.toString());
while (item.hasMoreTokens()) {
Both_21_27.set(item.nextToken());
context.write(Both_21_27, one);
}
switch(item) {
case 21 :
item_21 = true;
break;
case 27 :
item_27 = true;
break;
}
if (item_21 = true && item_27 = true) {
context.write(Both_21_27, one);
else return;
}
}
}
I am stuck with this map function. Any help, advice, suggestions?
I think you need to move switch block into while loop.
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
boolean item_39 = false;
boolean item_48 = false;
StringTokenizer item = new StringTokenizer(value.toString());
while (item.hasMoreTokens()) {
Both_39_48.set(item.nextToken());
context.write(Both_39_48, one);
switch((Integer)item.nextElement()) { // use nextElement and convert to int
case 39 :
item_39 = true;
break;
case 48 :
item_48 = true;
break;
}
}
if (item_39 && item_48) {
context.write(Both_39_48, one);
}
else return;
}

Android regular expression that split into Array of string

I want a regular expression that matches as following
String myString =
"11 22 01 02 22 11
11 22 31 32 22 11
11 22 51 42 22 11 ......"
i want to match both starting 11 22 and ending string 22 11 sequence and also i want to split the string into array of 01 02,31 32,51 42, ....
String[] resultArray = myString.split("11 22 .* 22 11");
I am getting only empty array with proper size of 11 22 xxx 22 11 sequence.
You can use groups for that purpose.
Pattern p = Pattern.compile("MY TEXT (.*) MY TEXT MY TEXT (.*) My TEXT");
Matcher m = p.matcher("MY TEXT hello you MY TEXT MY TEXT are here My TEXT");
if (m.find()) {
System.out.println(m.group(1)); // prints 'hello you'
System.out.println(m.group(2)); // prints 'are here'
}

how to put String values into new Line on getting spaces in java

Hii Guys !!!
I have a string with values like 69 17 17 16 2 1 1 26 26 56 56 69 20 19 20 etc .Now As per my need i have to put these values into new String with each values in new line as after each value space is there ..
Any help will be highly appreciated..
Thanx in advance...
String newStr = origStr.replaceAll(" ", " \n");
You should split the String using a specific separator into a List.
Then print out the List using the format required.
This helps when tomorow the String contains digits, decimals, text, etc or they want the text in another format.
String source = "69 17 17 16 2 1 1 26 26 56 56 69 20 19 20";
String[] splitted = source.split(" ");
StringBuilder sb = new StringBuilder();
for (String split : splitted){
sb.append(split).append(System.getProperty("line.separator"));
}
System.out.println(sb.toString());

Categories

Resources