How can I translate a Spark Client submitApplication to Yarn Rest API? - java

Currently I have a working code implementation of submitting an application to Yarn using spark.deploy.yarn.Client. It's complex to aggregate all the arguments this client needs, but the submission of the application is simple:
ClientArguments cArgs = new ClientArguments(args.toArray(new String[0]));
client = new Client(cArgs, sparkConf);
applicationID = client.submitApplication();
Most of the code before this point was accumulating the sparkConf and args. Now I wish to retire the Client and work with Rest only. Spark offers a full REST api including submitting applications - according to the Spark documentation it's a matter of this easy json/xml post:
POST http://<rm http address:port>/ws/v1/cluster/apps
Accept: application/json
Content-Type: application/json
{
"application-id":"application_1404203615263_0001",
"application-name":"test",
"am-container-spec":
{
"local-resources":
{
"entry":
[
{
"key":"AppMaster.jar",
"value":
{
"resource":"hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar",
"type":"FILE",
"visibility":"APPLICATION",
"size": 43004,
"timestamp": 1405452071209
}
}
]
},
"commands":
{
"command":"{{JAVA_HOME}}/bin/java -Xmx10m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr"
},
"environment":
{
"entry":
[
{
"key": "DISTRIBUTEDSHELLSCRIPTTIMESTAMP",
"value": "1405459400754"
},
{
"key": "CLASSPATH",
"value": "{{CLASSPATH}}<CPS>./*<CPS>{{HADOOP_CONF_DIR}}<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/*<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/lib/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/lib/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/lib/*<CPS>./log4j.properties"
},
{
"key": "DISTRIBUTEDSHELLSCRIPTLEN",
"value": "6"
},
{
"key": "DISTRIBUTEDSHELLSCRIPTLOCATION",
"value": "hdfs://hdfs-namenode:9000/user/testuser/demo-app/shellCommands"
}
]
}
},
"unmanaged-AM":false,
"max-app-attempts":2,
"resource":
{
"memory":1024,
"vCores":1
},
"application-type":"YARN",
"keep-containers-across-application-attempts":false,
"log-aggregation-context":
{
"log-include-pattern":"file1",
"log-exclude-pattern":"file2",
"rolled-log-include-pattern":"file3",
"rolled-log-exclude-pattern":"file4",
"log-aggregation-policy-class-name":"org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AllContainerLogAggregationPolicy",
"log-aggregation-policy-parameters":""
},
"attempt-failures-validity-interval":3600000,
"reservation-id":"reservation_1454114874_1",
"am-black-listing-requests":
{
"am-black-listing-enabled":true,
"disable-failure-threshold":0.01
}
}
I tried to translate my arguments into this JSON body of the POST request, but it seems impossible. Does anyone know if I can reverse-engineer from a running application I submitted the JSON payload to send via REST? Or what mapping I could use to take the Client arguments and place them in the JSON?

After a little searching, I managed to submit an application from the REST API only. It's not a well-documented process, so I'm posting it here.
NOTE: if at any time you wish to compare the content of the request to the request sent by the client, use debug breakpoints to inspect the application context used by the Client.
Open the class org.apache.hadoop.yarn.client.api.impl.YarnClientImpl and go to the method submitApplication(ApplicationSubmissionContext appContext).
Firstly, to replace the spark.deploy.yarn.Client with a REST API request, the solution must make sure all the files mentioned in the configuration are available on the HDFS.
Later, it needs to compose and upload one extra file called __spark_conf__.zip.
Step 1
Go over the files from the SparkConf (the Client's 2nd argument): files mentioned in "AllJars" tag, file mentioned in "mainJarPath", and files mentioned in "FilesList".
For each file, check if it exists in the HDFS and if not - upload it from the local machine. For each file get its FileStatus from the HDFS.
aggregate the resources list, which is an attribute map for each file containing these 6 attributes:
size = getLen()
timestamp = getModificationTime()
type=FILE
visibility=PUBLIC
Two other attributes: key and resource.
Files from the allJars list: Key is spark_libs/{{filename}}, and the resource is the filename.
Files from the FilesList: key is "localEntry" tag, resource is "hdfsPath" tag.
the File in mainJarPath: key is "app.jar", resource is the filename.
Step 2
Creating the __spark_conf__.zip file. You can create it directly in the hdfs, in the staging path which is usually {{HDFS_base_folder}}/user/{{username}}/.sparkStaging/{{application_id}}/__spark_conf__.zip.
This archive file contains two files and one empty directory: one file __spark_hadoop_conf__.xml (a rename to core-site.xml), and the other file is called __spark_conf__.properties which is a slightly modified version
of the sparkConf section from the configuration.
To create __spark_conf__.properties you will need to read the JSON map from "sparkConf"->"org$apache$spark$SparkConf$$settings", and convert each line from the JSON format "spark.safemine.addcontrol.driverMemory": "5120M"
to spark.safemine.addcontrol.driverMemory=5120M
To the bottom of the file add 6 new lines:
spark.yarn.cache.confArchive={{the location to which you will upload __spark_conf__.zip in the sparkStaging}}
spark.yarn.cache.visibilities={{all the visibilities of the files, comma delimited - basically "PUBLIC,PUBLIC, ... ,PUBLIC"}}
spark.yarn.cache.timestamps={{All the timestamps for the files, comma delimited}}
spark.yarn.cache.types={{all the types of the files, comma delimited - basically "FILE,FILE, ... ,FILE"}}
spark.yarn.cache.filenames={{All the filenames and keys, recorded as resource#key and comma delimited}}
spark.yarn.cache.sizes={{All the sizes for the files, comma delimited}}
Make sure you compile the 5 aggregated lines in respective order. I used this code:
String confArchive = "spark.yarn.cache.confArchive="+hdfs+"/user/"+userName+"/.sparkStaging/"+applicationId+"/__spark_conf__.zip";
String filenames = "spark.yarn.cache.filenames=";
String sizes = "spark.yarn.cache.sizes=";
String timestamps = "spark.yarn.cache.timestamps=";
String types = "spark.yarn.cache.types=";
String visibilities = "spark.yarn.cache.visibilities=";
for (Map<String,String> localResource:localResources) {
filenames+=localResource.get("resource")+"#"+localResource.get("key")+",";
sizes+=localResource.get("size")+",";
timestamps+=localResource.get("timestamp")+",";
types+=localResource.get("type")+",";
visibilities+=localResource.get("visibility")+",";
}
properties+=confArchive+"\n";
properties+=filenames.substring(0,filenames.length()-1)+"\n";
properties+=sizes.substring(0,sizes.length()-1)+"\n";
properties+=timestamps.substring(0,timestamps.length()-1)+"\n";
properties+=types.substring(0,types.length()-1)+"\n";
properties+=visibilities.substring(0,visibilities.length()-1)+"\n";
The __spark_hadoop_conf__.xml file is a simple rename of core-site.xml, and the folder created with them is named __hadoop_conf__ and is left empty.
you can save the files to the hdfs directly like so:
private void generateSparkConfInHdfs(String applicationId, String userName, String sparkConfProperties, String sparkHadoopConf) throws IOException {
String path = hdfs+"/user/"+userName+"/.sparkStaging/"+applicationId+"/__spark_conf__.zip";
Path hdfsPath = new Path(path);
ZipOutputStream os = new ZipOutputStream(getHdfs().create(hdfsPath));
os.putNextEntry(new ZipEntry("__hadoop_conf__/"));
os.putNextEntry(new ZipEntry("__spark_conf__.properties"));
os.write(sparkConfProperties.getBytes(),0,sparkConfProperties.getBytes().length);
os.putNextEntry(new ZipEntry("__spark_hadoop_conf__.xml"));
os.write(sparkHadoopConf.getBytes(),0,sparkHadoopConf.getBytes().length);
os.close();
}
After you finish creating the file, add it to the resources list with these specifications:
size = getLen()
timestamp = getModificationTime()
type = ARCHIVE
visibility = PRIVATE
key = __spark_conf__
resource is the staging directory (usually {{HDFS_base_folder}}/user/{{username}}/.sparkStaging/{{application_id}}/__spark_conf__.zip).
Go over the full resources list and create an XML/JSON from them with this structure for each one, using the values we collected in the {{}} placeholders:
<entry>
<key>{{key}}</key>
<value>
<resource>{{resource}}</resource>
<size>{{size}}</size>
<timestamp>{{timestamp}}</timestamp>
<type>{{type}}</type>
<visibility>{{visibility}}</visibility>
</value>
</entry>
The accumulated string will be your localResources XML segment shown below.
Step 3
Generating the Java command. You will need to extract a few elements from the SparkConfig:
driverMemory - from the same attribute in the sparkConf
extraJavaOptions = from spark.driver.extraJavaOptions within the attribute collection
mainClass - from the same attribute in the sparkConf
argstr - collect all the ClientArgs except the --class one.
The result command with the elements included is:
String command = "$JAVA_HOME/bin/java -server -Xmx"+driverMemory+" -Djava.io.tmpdir=$PWD/tmp "+extraJavaOptions+" -Dspark.yarn.app.container.log.dir=<LOG_DIR> "
+ "org.apache.spark.deploy.yarn.ApplicationMaster --class "+mainClass+" "+argstr+" "
+ "--properties-file $PWD/__spark_conf__/__spark_conf__.properties 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr";
Step 4
Compiling the request XML.
NOTE: my implementation requires a label on the AM container, so am-container-node-label-expression is added. This will not be applicable in all cases.
The mapping from the sparkConf to the REST request is (shown here in XML, JSON implementation is also supported):
<application-submission-context>
<application-id>"+applicationId+"</application-id>
<application-name>"+appName+"</application-name>
<queue>default</queue>
<priority>0</priority>
<am-container-spec>
<local-resources>+localResources+</local-resources>
<environment>
<entry>
<key>SPARK_YARN_STAGING_DIR</key>
<value>"+hdfs+"/user/"+userName+"/.sparkStaging/"+applicationId+"</value>
</entry>
<entry>
<key>CLASSPATH</key>
<value>$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:/spark-non-hdfs-storage/spark-assembly-2.3.0-hadoop2.7/*:%HADOOP_CONF_DIR%:%HADOOP_COMMON_HOME%/share/hadoop/common/*:%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*:%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*:%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*:%HADOOP_YARN_HOME%/share/hadoop/yarn/*:%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*:%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*:%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*:$PWD/__spark_conf__/__hadoop_conf__</value>
</entry>
<entry>
<key>SPARK_USER</key>
<value>"+userName+"</value>
</entry>
</environment>
<commands>
<command>"+command+"</command>
</commands>
</am-container-spec>
<unmanaged-AM>false</unmanaged-AM>
<max-app-attempts>1</max-app-attempts>
<resource>
<memory>5632</memory>
<vCores>1</vCores>
</resource>
<application-type>SPARK</application-type>
<keep-containers-across-application-attempts>false</keep-containers-across-application-attempts>
<application-tags>
<tag>"+sparkYarnTag+"</tag>
</application-tags>
<am-container-node-label-expression>appMngr</am-container-node-label-expression>
<log-aggregation-context/>
<attempt-failures-validity-interval>1</attempt-failures-validity-interval>
<reservation-id/>
</application-submission-context>
Step 5:
Submitting the application via REST http PUT:
private void submitApplication (String body, String userName) throws SMSparkManagerException {
HttpClient client = HttpClientBuilder.create().build();
HttpPost request = new HttpPost(uri+"?user.name="+userName);
try {
request.setEntity(new StringEntity(body, ContentType.APPLICATION_XML));
HttpResponse response = client.execute(request);
if (response.getStatusLine().getStatusCode()!=202) {
throw new SMSparkManagerException("The application could not be submitted to Yarn, response http code "+response.getStatusLine().getStatusCode());
}
} catch (UnsupportedEncodingException e) {
logger.error("The application Could not be submitted due to UnsupportedEncodingException in the provided body: "+body, e );
throw new SMSparkManagerException("Error in submitting application to yarn");
} catch (ClientProtocolException e) {
logger.error("The application Could not be submitted due to ClientProtocolException", e);
throw new SMSparkManagerException("Error in submitting application to yarn");
} catch (IOException e) {
logger.error("The application Could not be submitted due to IOException", e);
throw new SMSparkManagerException("Error in submitting application to yarn");
}
}

Related

JMeter Java Test - ${__UUID()} function in JSON doesn't work

The task:
I need to perform a post request to an endpoint.
Request body type is JSON:
{
"id": "${__UUID()}"
}
I want to simulate 10 users that each will send a payload with unique "id" field generated.
From JMeter GUI it works as expected, but from Java Code it seems not to be recognized and reads it like a simple string or so.
Here is the Java Code:
public JMeterClient httpSamplerProxy(String name, String endpoint, String payload, String httpMethod) {
Arguments arguments = new Arguments();
HTTPArgument httpArgument = new HTTPArgument();
httpArgument.setMetaData("=");
httpArgument.setValue(payload);
List<Argument> args = new ArrayList<>();
args.add(httpArgument);
arguments.setArguments(args);
HTTPSamplerProxy httpSamplerProxy = new HTTPSamplerProxy();
httpSamplerProxy.setProperty(TestElement.TEST_CLASS, HTTPSamplerProxy.class.getName());
httpSamplerProxy.setProperty(TestElement.GUI_CLASS, HttpTestSampleGui.class.getName());
httpSamplerProxy.setName(name);
httpSamplerProxy.setEnabled(true);
httpSamplerProxy.setPostBodyRaw(true);
httpSamplerProxy.setFollowRedirects(true);
httpSamplerProxy.setAutoRedirects(false);
httpSamplerProxy.setUseKeepAlive(true);
httpSamplerProxy.setDoMultipart(false);
httpSamplerProxy.setPath(endpoint);
httpSamplerProxy.setMethod(httpMethod);
httpSamplerProxy.setArguments(arguments);
httpSamplerProxies.add(httpSamplerProxy);
return this;
}
where payload is a json in a string representative.
I Use JMeter 5.4.1
Besides that I need this to work, how can I enable logging of Post Body in Java to see it in console?
I was missing dependency of functions:
<dependency>
<groupId>org.apache.jmeter</groupId>
<artifactId>ApacheJMeter_functions</artifactId>
<version>5.4.1</version>
</dependency>
After adding this to class path the function got recognized

Convert json to dynamically generated protobuf in Java

Given the following json response:
{
"id" : "123456",
"name" : "John Doe",
"email" : "john.doe#example.com"
}
And the following user.proto file:
message User {
string id = 1;
string name = 2;
string email = 3;
}
I would like to have the possibility to dynamically create the protobuf message class (compile a .proto at runtime), so that if the json response gets enhanced with a field "phone" : "+1234567890" I could just upload a new version of the protobuf file to contain string phone = 4 and get that field exposed in the protobuf response, without a service restart.
If I were to pull these classes from a hat, I would like to be able to write something along the following code.
import com.googlecode.protobuf.format.JsonFormat;
import com.googlecode.protobuf.Message;
import org.apache.commons.io.FileUtils;
...
public Message convertToProto(InputStream jsonInputStream){
// get the latest user.proto file
String userProtoFile = FileUtils.readFileToString("user.proto");
Message userProtoMessage = com.acme.ProtobufUtils.compile(userProtoFile);
Message.Builder builder = userProtoMessage.newBuilderForType();
new JsonFormat().merge(jsonInputStream, Charset.forName("UTF-8"), builder);
return builder.build();
}
Is there an existing com.acme.ProtobufUtils.compile(...) method? Or how to implement one? Running a protoc + load class seems overkill, but I'm willing to use it if no other option...
You cannot compile the .proto file (at least not in Java), however you can pre-compile the .proto into a descriptor .desc
protoc --descriptor_set_out=user.desc user.proto
and then use the DynamicMessage's parser:
DynamicMessage.parseFrom(Descriptors.Descriptor type, byte[] data)
Source: google groups thread

Rest Api Zabbix (method : item.get) how to get full names of the metrics

My method for obtaining id metrics from zabbix:
protected String getItemId(String host, String zabbixHostItemName) {
JSONObject hostItemsFilter = new JSONObject();
hostItemsFilter.put("name", new String[]{zabbixHostItemName});
return connectZabbix.zabbixAPI.call(RequestBuilder.newBuilder()
.method("item.get")
.paramEntry("filter", hostItemsFilter)
.paramEntry("host", host)
.build()).getJSONArray("result").getJSONObject(0).getString("itemid");
}
What the following request body generates:
{
"jsonrpc": "2.0",
"method": "item.get",
"params": {
"filter": {
"name": [
"myItem"
]
},
"host": "myHost"
}
}
It almost always works well.
The problem occurs when Zabbiks returns parameterized metric names.
For example, if you request a metric:
Incomming network traffic on lan900
My method returns an error, because the data on the network interfaces parameterized.
If I request all the metrics on the host from the zabbix then for example the necessary "Incomming network traffic on" will match the name:
Incomming network traffic on $1
How to build a query that would find the itemid from the full name of the metric and host?
The current item API cannot expand macros automatically, it's a feature implementend for instance in the trigger API (expandComment, expandDescription, expandExpression).
You can upvote this feature request.
You can do a first query for "Incoming network traffic on $1", which will return an array of matching items, one for each network interface in your case.
Then you can filter on the 'key_' field with the real interface name.
A small python sample:
f = { 'name' : 'Incoming packet on $1' }
hostname = 'somehostname'
itemObj = zapi.item.get(filter=f, host=hostname, output=['itemids', 'name', 'key_'] )
for item in itemObj:
if re.search('eth0', item['key_']):
print item['itemid']

Adding an attachment on Azure CosmosDB

I am looking for some help on how to add an attachment on CosmosDB. Here is the little background.
Our application is currently on IBM Bluemix and we are using CloudantDB. We use CloudanDB to store attachments (PDF file). We are no moving to Azure PaaS App Service and planning to use CosmosDB. I am looking for help on how to create an attachment on CosmosDB using Java API. What API do I need to use? I want to do a small POC.
Thanks,
Well Personally i feel In Azure, if you go want to put files into documentDb, you will pay high for the query cost. Instead it would be normal practice to use Azure blob and save the link in a field, and then return url if its public or binary data if you want it to be secured.
However, You could store it using
var myDoc = new { id = "42", Name = "Max", City="Aberdeen" }; // this is the document you are trying to save
var attachmentStream = File.OpenRead("c:/Path/To/File.pdf"); // this is the document stream you are attaching
var client = await GetClientAsync();
var createUrl = UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName);
Document document = await client.CreateDocumentAsync(createUrl, myDoc);
await client.CreateAttachmentAsync(document.SelfLink, attachmentStream, new MediaOptions()
{
ContentType = "application/pdf", // your application type
Slug = "78", // this is actually attachment ID
});
WORKING WITH ATTACHMENTS
I have answered a similar question here
What client API I can use?
You could follow the cosmos db java sdk to CRUD attachment.
import com.microsoft.azure.documentdb.*;
import java.util.UUID;
public class CreateAttachment {
// Replace with your DocumentDB end point and master key.
private static final String END_POINT = "***";
private static final String MASTER_KEY = "***";
public static void main(String[] args) throws Exception, DocumentClientException {
DocumentClient documentClient = new DocumentClient(END_POINT,
MASTER_KEY, ConnectionPolicy.GetDefault(),
ConsistencyLevel.Session);
String uuid = UUID.randomUUID().toString();
Attachment attachment = getAttachmentDefinition(uuid, "application/text");
RequestOptions options = new RequestOptions();
ResourceResponse<Attachment> attachmentResourceResponse = documentClient.createAttachment(getDocumentLink(), attachment, options);
}
private static Attachment getAttachmentDefinition(String uuid, String type) {
return new Attachment(String.format(
"{" +
" 'id': '%s'," +
" 'media': 'http://xstore.'," +
" 'MediaType': 'Book'," +
" 'Author': 'My Book Author'," +
" 'Title': 'My Book Title'," +
" 'contentType': '%s'" +
"}", uuid, type));
}
}
In the documentation it says, total file size we can store is 2GB.
"Azure Cosmos DB allows you to store binary blobs/media either with
Azure Cosmos DB (maximum of 2 GB per account) " Is it the max we can
store?
Yes.The size of attachments is limited in document db. However, there are two methods for creating a Azure Cosmos DB Document Attachment.
1.Store the file as an attachment to a Document
The raw attachment is included as the body of the POST.
Two headers must be set:
Slug – The name of the attachment.
contentType – Set to the MIME type of the attachment.
2.Store the URL for the file in an attachment to a Document
The body for the POST include the following.
id – It is the unique name that identifies the attachment, i.e. no two attachments will share the same id. The id must not exceed 255 characters.
Media – This is the URL link or file path where the attachment resides.
The following is an example
{
"id": "device\A234",
"contentType": "application/x-zip-compressed",
"media": "www.bing.com/A234.zip"
}
If your files are over limitation , you could try to store them with second way. More details, please refer to blog.
In addition, you could notice that cosmos db attachments support
garbage collect mechanism,it ensures to garbage collect the media when all of the outstanding references are dropped.
Hope it helps you.

How to I decode the headers of a message using Gmail-API using Java?

I am working on a fragment in android studio whose purpose is to display To/From, Subject, and the body of the message. So far, I am able to retrieve, decode, and display the body. I tried using a similar method for the headers but for some reason it isn't decoding properly, or my method calls aren't getting the correct information. Here is the code I am working with:
String user = "me";
String query = "in:inbox is:unread";
textView.setText("Inbox");
ListMessagesResponse messageResponse =
mService.users().messages().list(user).setQ(query).setMaxResults(Long.valueOf(1)).execute();
List<Message> messages = messageResponse.getMessages();
for(Message message : messages){
Message message2 = mService.users().messages().get(user, message.getId()).execute();
//Get Headers
byte[] headerBytes = Base64.decodeBase64(message2.getPayload().getParts().get(0).getHeaders().get(0).getName().toString().trim()); // get headers
String header = new String(headerBytes, "UTF-8");
//Get Body
byte[] bodyBytes = Base64.decodeBase64(message2.getPayload().getParts().get(0).getBody().getData().trim().toString()); // get body
String body = new String(bodyBytes, "UTF-8");
messageList.add(header);
messageList.add(body);
}
return messageList;
The section under // get body works. But the section under //Get Headers returns data with weird symbols which include black diamonds with white question marks inside and letters in random order. I have tried many different combinations and orders for the method calls in the Base64.decodeBase64 statement for headerBytes but wasn't able to succeed. Is there something I am missing?
Edit: I looked at the gmail-api documentation on the google developers site and I still am confused on how the header information is stored and how to retrieve specific things such as To, From, and Subject. That might be my problem since I may not be targeting the correct data.
If I list messages and get the first one, we can see what the message looks like:
Request
format = metadata
metadataHeaders = From,To,Subject
fields = payload/headers
GET https://www.googleapis.com/gmail/v1/users/me/messages/15339f3d12042fec?format=metadata&metadataHeaders=To&metadataHeaders=From&metadataHeaders=Subject&fields=payload%2Fheaders&access_token={ACCESS_TOKEN}
Response
{
"payload": {
"headers": [
{
"name": "To",
"value": "Emil <emtholin#gmail.com>"
},
{
"name": "From",
"value": "\"BernieSanders.com\" <info#berniesanders.com>"
},
{
"name": "Subject",
"value": "5,000,000"
}
]
}
}
As you can see, the values you are looking for are in the headers. You just have to sort them out in Java and you are done. The headers are not encoded like the body, so there is no need to do any decoding.

Categories

Resources