strange packets seen when using okhttp to upload a file - java

I am trying to upload a file into a server through okhttp, I am using a form so using the MultipartBuilder.
OkHttpClient client = new OkHttpClient();
MediaType OCTET_STREAM = MediaType.parse("application/octet-stream");
RequestBody body = new MultipartBuilder()
.type(MultipartBuilder.FORM)
.addPart(Headers.of("Content-Disposition",
"form-data; name=\"breadcrumb\"; filename=\"myfile.bin\"",
"Content-Transfer-Encoding", "binary"),
RequestBody.create(OCTET_STREAM, file))
.build();
Request request = new Request.Builder()
.header("Authorization", cred)
.url(url)
.post(body)
.build();
Response response = client.newCall(request).execute();
But when tracing what goes on the wire:
POST /breadcrumb/ HTTP/1.1
Authorization: Basic aHl6OmhvbGExMjM=
Content-Type: multipart/form-data; boundary=9bc835d6-24b8-42c4-ae8d-5bc89b3fe68f
Transfer-Encoding: chunked
Host: myurl:8000
Connection: Keep-Alive
Accept-Encoding: gzip
10e
--9bc835d6-24b8-42c4-ae8d-5bc89b3fe68f
Content-Disposition: form-data; name="breadcrumb"; filename="myfile.bin"
Content-Transfer-Encoding: binary
Content-Type: application/octet-stream
Content-Length: 21
some file content in binary
--9bc835d6-24b8-42c4-ae8d-5bc89b3fe68f--
0
When I use the traditional Apache http builders, it looks similar, but I don't see the strange characters at the beginning and at the end (10e, 0). Any ideas?
Thanks for your help.

You didn't specified exact Content-Length header, so OkHttpClient started to use chunked transfer encoding.
In HTTP protocol receiver must always know exact length of content(for purpose of allocating memory or other resources) before content will be realy send to server. There is two ways to send it - whole length of content in Content-Length header or by using chucked encoding if content-length can't be calculated at start of request.
This line:
10e
is just says that after that line client will send part of some data with length 0x10e (270) bytes.

The current MultipartBuilder implementation does not support setting a fixed Content-Length. One option would be implementing a FixedMultipartBuilder, which does its magic in the public long contentLength() method. Instead of returning -1L it now computes the length, e.g.:
import com.squareup.okhttp.Headers;
import com.squareup.okhttp.MediaType;
import com.squareup.okhttp.MultipartBuilder;
import com.squareup.okhttp.RequestBody;
import com.squareup.okhttp.internal.Util;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import okio.BufferedSink;
import okio.ByteString;
/**
* Fluent API to build <a href="http://www.ietf.org/rfc/rfc2387.txt">RFC
* 2387</a>-compliant request bodies.
*/
public final class FixedMultipartBuilder {
private static final byte[] COLONSPACE = { ':', ' ' };
private static final byte[] CRLF = { '\r', '\n' };
private static final byte[] DASHDASH = { '-', '-' };
private final ByteString boundary;
private MediaType type = MultipartBuilder.MIXED;
// Parallel lists of nullable headers and non-null bodies.
private final List<Headers> partHeaders = new ArrayList<>();
private final List<RequestBody> partBodies = new ArrayList<>();
/** Creates a new multipart builder that uses a random boundary token. */
public FixedMultipartBuilder() {
this(UUID.randomUUID().toString());
}
/**
* Creates a new multipart builder that uses {#code boundary} to separate
* parts. Prefer the no-argument constructor to defend against injection
* attacks.
*/
public FixedMultipartBuilder(String boundary) {
this.boundary = ByteString.encodeUtf8(boundary);
}
/**
* Set the MIME type. Expected values for {#code type} are
* {#link com.squareup.okhttp.MultipartBuilder#MIXED} (the default),
* {#link com.squareup.okhttp.MultipartBuilder#ALTERNATIVE},
* {#link com.squareup.okhttp.MultipartBuilder#DIGEST},
* {#link com.squareup.okhttp.MultipartBuilder#PARALLEL} and
* {#link com.squareup.okhttp.MultipartBuilder#FORM}.
*/
public FixedMultipartBuilder type(MediaType type) {
if (type == null) {
throw new NullPointerException("type == null");
}
if (!type.type().equals("multipart")) {
throw new IllegalArgumentException("multipart != " + type);
}
this.type = type;
return this;
}
/** Add a part to the body. */
public FixedMultipartBuilder addPart(RequestBody body) {
return addPart(null, body);
}
/** Add a part to the body. */
public FixedMultipartBuilder addPart(Headers headers, RequestBody body) {
if (body == null) {
throw new NullPointerException("body == null");
}
if (headers != null && headers.get("Content-Type") != null) {
throw new IllegalArgumentException("Unexpected header: Content-Type");
}
if (headers != null && headers.get("Content-Length") != null) {
throw new IllegalArgumentException("Unexpected header: Content-Length");
}
partHeaders.add(headers);
partBodies.add(body);
return this;
}
/**
* Appends a quoted-string to a StringBuilder.
*
* <p>RFC 2388 is rather vague about how one should escape special characters
* in form-data parameters, and as it turns out Firefox and Chrome actually
* do rather different things, and both say in their comments that they're
* not really sure what the right approach is. We go with Chrome's behavior
* (which also experimentally seems to match what IE does), but if you
* actually want to have a good chance of things working, please avoid
* double-quotes, newlines, percent signs, and the like in your field names.
*/
private static StringBuilder appendQuotedString(StringBuilder target, String key) {
target.append('"');
for (int i = 0, len = key.length(); i < len; i++) {
char ch = key.charAt(i);
switch (ch) {
case '\n':
target.append("%0A");
break;
case '\r':
target.append("%0D");
break;
case '"':
target.append("%22");
break;
default:
target.append(ch);
break;
}
}
target.append('"');
return target;
}
/** Add a form data part to the body. */
public FixedMultipartBuilder addFormDataPart(String name, String value) {
return addFormDataPart(name, null, RequestBody.create(null, value));
}
/** Add a form data part to the body. */
public FixedMultipartBuilder addFormDataPart(String name, String filename, RequestBody value) {
if (name == null) {
throw new NullPointerException("name == null");
}
StringBuilder disposition = new StringBuilder("form-data; name=");
appendQuotedString(disposition, name);
if (filename != null) {
disposition.append("; filename=");
appendQuotedString(disposition, filename);
}
return addPart(Headers.of("Content-Disposition", disposition.toString()), value);
}
/** Assemble the specified parts into a request body. */
public RequestBody build() {
if (partHeaders.isEmpty()) {
throw new IllegalStateException("Multipart body must have at least one part.");
}
return new MultipartRequestBody(type, boundary, partHeaders, partBodies);
}
private static final class MultipartRequestBody extends RequestBody {
private final ByteString boundary;
private final MediaType contentType;
private final List<Headers> partHeaders;
private final List<RequestBody> partBodies;
public MultipartRequestBody(MediaType type, ByteString boundary, List<Headers> partHeaders,
List<RequestBody> partBodies) {
if (type == null) throw new NullPointerException("type == null");
this.boundary = boundary;
this.contentType = MediaType.parse(type + "; boundary=" + boundary.utf8());
this.partHeaders = Util.immutableList(partHeaders);
this.partBodies = Util.immutableList(partBodies);
}
#Override public MediaType contentType() {
return contentType;
}
private long contentLengthForPart(Headers headers, RequestBody body) throws IOException {
// Check if the body has an contentLength != -1, otherwise cancel!
long bodyContentLength = body.contentLength();
if(bodyContentLength < 0L) {
return -1L;
}
long length = 0;
length += DASHDASH.length + boundary.size() + CRLF.length;
if (headers != null) {
for (int h = 0, headerCount = headers.size(); h < headerCount; h++) {
length += headers.name(h).getBytes().length
+ COLONSPACE.length
+ headers.value(h).getBytes().length
+ CRLF.length;
}
}
MediaType contentType = body.contentType();
if (contentType != null) {
length += "Content-Type: ".getBytes().length
+ contentType.toString().getBytes().length
+ CRLF.length;
}
length += CRLF.length;
length += bodyContentLength;
length += CRLF.length;
return length;
}
#Override public long contentLength() throws IOException {
long length = 0;
for (int p = 0, partCount = partHeaders.size(); p < partCount; p++) {
long contentPartLength = contentLengthForPart(partHeaders.get(p), partBodies.get(p));
if(contentPartLength < 0) {
// Too bad, can't get contentPartLength!
return -1L;
}
length += contentPartLength;
}
length += DASHDASH.length + boundary.size() + DASHDASH.length + CRLF.length;
return length;
}
#Override public void writeTo(BufferedSink sink) throws IOException {
for (int p = 0, partCount = partHeaders.size(); p < partCount; p++) {
Headers headers = partHeaders.get(p);
RequestBody body = partBodies.get(p);
sink.write(DASHDASH);
sink.write(boundary);
sink.write(CRLF);
if (headers != null) {
for (int h = 0, headerCount = headers.size(); h < headerCount; h++) {
sink.writeUtf8(headers.name(h))
.write(COLONSPACE)
.writeUtf8(headers.value(h))
.write(CRLF);
}
}
MediaType contentType = body.contentType();
if (contentType != null) {
sink.writeUtf8("Content-Type: ")
.writeUtf8(contentType.toString())
.write(CRLF);
}
// Skipping the Content-Length for individual parts
sink.write(CRLF);
partBodies.get(p).writeTo(sink);
sink.write(CRLF);
}
sink.write(DASHDASH);
sink.write(boundary);
sink.write(DASHDASH);
sink.write(CRLF);
}
}
}
Sorry for the long listing... Sadly the MultipartBuilder is final so we have to copy most of the source instead of simply extend it.

Related

Getting email text from ImageHtmlEmail

We are using apache commons mail, specifically the ImageHtmlEmail. We really would like to log every email sent - exactly as it will be sent - in the perfect world it would be something you could paste into sendmail - with all headers and other information included.
This is primarily to troubleshoot some problems we've been having with it turning up as text/plain rather than text/html - but also because it would be nice to have a record of exactly what the system sent out stored in our logs.
So essentially - the dream is a function that would take an ImageHtmlEmail and return a string - as it will be sent. I know I could render it into a string myself, but then I'm bypassing whatever is being done in the library function, which is what we really want to capture. I tried BuildMimeMessage and then getMimeMessage, which I think is probably the correct first step - but that just leaves me with the question of how to turn a mimemessage into a string.
I have a sort of solution - but would love a better one:
/**
* add content of this type
*
* #param builder
* #param content
*/
private static void addContent(final StringBuilder builder, final Object content)
{
try
{
if (content instanceof MimeMultipart)
{
final MimeMultipart multi = (MimeMultipart) content;
for (int i = 0; i < multi.getCount(); i++)
{
addContent(builder, ((MimeMultipart) content).getBodyPart(i));
}
}
else if (content instanceof MimeBodyPart)
{
final MimeBodyPart message = (MimeBodyPart) content;
final Enumeration<?> headers = message.getAllHeaderLines();
while (headers.hasMoreElements())
{
final String line = (String) headers.nextElement();
builder.append(line).append("\n");
}
addContent(builder, message.getContent());
}
else if (content instanceof String)
{
builder.append((String) content).append("\n");
}
else
{
System.out.println(content.getClass().getName());
throw CommonException.notImplementedYet();
}
}
catch (final Exception theException)
{
throw CommonException.insteadOf(theException);
}
}
/**
* get a string from an email
*
* #param email
* #return
*/
public static String fromHtmlEmail(final ImageHtmlEmail email)
{
return fromMimeMessage(email.getMimeMessage());
}
/**
* #param message
* #return a string from a mime message
*/
private static String fromMimeMessage(final MimeMessage message)
{
try
{
message.saveChanges();
final StringBuilder output = new StringBuilder();
final Enumeration<?> headers = message.getAllHeaderLines();
while (headers.hasMoreElements())
{
final String line = (String) headers.nextElement();
output.append(line).append("\n");
}
addContent(output, message.getContent());
return output.toString();
}
catch (final Exception theException)
{
throw CommonException.insteadOf(theException);
}
}
}

HtmlViewService in OBIEE Web Service API is not rendering report (it's only showing spinning loader)

I have a Java application that leverages the OBIEE Web Service API to consume data from the BI Server. I am able to XMLViewService and the WebCatalogService just fine, but I can't quite get the HtmlViewService to properly render a report in the Java app. The report just shows the spinning loader, but never actually renders the report. I'm pretty sure it has to do with the fact that the Java app and the BI Server are on different domains. This is what the API documentation says:
In situations where Oracle BI Web Services and the third-party Web server do not belong to the same Domain Name Service (DNS) domain, users may get JavaScript errors related to browser security constraints for cross-domain scripting. To avoid these issues, use the setBridge() method to modify callback URLs to point to the third-party Web server. Be aware that a Web component executed by the third-party Web server to re-route requests to Oracle BI Web Services is not provided. This function would need to be fulfilled by the third-party application.
Several years ago, I did this same type of integration using .NET/C# and ran in the the same issue because the .NET app and the BI Server were on different domains. As a result, I had to create an HTTP Handler (.ashx file) as well as use the setBridge() method to to solve the issue.
The challenge that I'm having is that I can't find a servlet bridge example for Java. And I'm not too confident in porting the .NET/.ASHX code to a Java servlet/bridge. Does anyone have any code examples or direction they could provide to point me in the right direction? Here's a snippet of code to show you what I'm doing to pull back the report data:
// define report path
ReportRef reportRef = new ReportRef();
reportRef.setReportPath(reportFolder + "/" + reportName);
// set page params
StartPageParams pageParams = new StartPageParams();
pageParams.setDontUseHttpCookies(true);
// set report params
String pageId = htmlService.startPage(pageParams, sawSessionId);
String reportId = pageId + reportName;
htmlService.addReportToPage(pageId, reportId, reportRef, null, null, null, sawSessionId);
// get report html
StringBuffer reportHtml = new StringBuffer();
reportHtml.append(htmlService.getHtmlForReport(pageId, reportId, sawSessionId));
// return html
return reportHtml.toString();
This is the error that is coming back in the browser:
XMLHttpRequest cannot load http://myobiserver.com/analytics/saw.dll?ajaxGo. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://myjavaapp.com' is therefore not allowed access.
Per requested, here is my .NET/.ASHX bridge:
using System.Collections.Specialized;
using System.Net;
using System.Text;
using System.Web;
using System;
using System.Collections;
using System.Configuration;
using System.Collections.Specialized;
using System.Web;
using System.Text;
using System.Net;
using System.IO;
using System.Diagnostics;
/*
This is a ASP.NET handler that handles communication
between the SharePoint site and OracleBI.
It will be deployed to:
C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\LAYOUTS\OracleBI
*/
public class OracleBridge: IHttpHandler
{
public bool IsReusable {get{return true;}}
public OracleBridge()
{
}
string getServer()
{
string strServer = "http://<enter-domain>/analytics/saw.dll";
int index = strServer.LastIndexOf("/");//split off saw.dll
if (index >=0)
return strServer.Substring(0,index+1);
else
return strServer;
}
public void ProcessRequest(HttpContext context)
{
HttpWebRequest req = forwardRequest(context);
forwardResponse(context,req);
}
private HttpWebRequest forwardRequest(HttpContext context)
{
string strURL = makeURL(context);
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(strURL);
req.Method = context.Request.HttpMethod;
NameValueCollection headers = context.Request.Headers;
req.Accept = headers.Get("Accept");
req.Expect = headers.Get("Expect");
req.ContentType = headers.Get("Content-Type");
string strModifiedSince = headers.Get("If-Modified-Since");
if (strModifiedSince != null && strModifiedSince.Length != 0)
req.IfModifiedSince = DateTime.Parse(strModifiedSince);
req.Referer = headers.Get("Referer");
req.UserAgent = headers.Get("User-Agent");
if (!req.Method.Equals("GET"))
{
CopyStreams(context.Request.InputStream,req.GetRequestStream());
}
return req;
}
private void forwardResponse(HttpContext context, HttpWebRequest req)
{
HttpWebResponse resp =null;
try
{
resp = (HttpWebResponse)req.GetResponse();
}
catch(WebException e)
{
resp = (HttpWebResponse)e.Response;
}
context.Response.StatusCode = (int)resp.StatusCode;
for (int i = 0; i < resp.Cookies.Count; i++)
{
Cookie c = resp.Cookies[i];
HttpCookie hc = new HttpCookie(c.Name, c.Value);
hc.Path = c.Path;
hc.Domain = getServer();
context.Response.Cookies.Add(hc);
}
context.Response.ContentType = resp.ContentType;
CopyStreams(resp.GetResponseStream(), context.Response.OutputStream);
}
private string makeURL(HttpContext context)
{
string strQuery = context.Request.Url.Query;
string[] arrParams = strQuery.Split('?','&');
StringBuilder resultingParams = new StringBuilder();
string strURL=null;
foreach(string strParam in arrParams )
{
string[] arrNameValue = strParam.Split('=');
if (!arrNameValue[0].Equals("RedirectURL"))
{
if (strParam.Length != 0)
{
if (resultingParams.Length != 0)
resultingParams.Append("&");
resultingParams.Append(strParam);
}
}
else if (arrNameValue.Length >1)
strURL = HttpUtility.UrlDecode(arrNameValue[1]);
}
if (strURL ==null)
throw new Exception("Invalid URL format. requestURL parameter is missing");
String sAppendChar = strURL.Contains("?") ? "&" : "?";
if (strURL.StartsWith("http:") || strURL.StartsWith("https:"))
{
String tmpURL = strURL + sAppendChar + resultingParams.ToString();
return tmpURL;
}
else
{
String tmpURL = getServer() + strURL + sAppendChar + resultingParams.ToString();
return tmpURL;
}
}
private void CopyStreams(Stream inStr,Stream outStr)
{
byte[] buf = new byte[4096];
try
{
do
{
int iRead = inStr.Read(buf,0,4096);
if (iRead == 0)
break;
outStr.Write(buf,0,iRead);
}
while (true);
}
finally
{
outStr.Close();
}
}
}
Using the BridgeServlet from link (http://pastebin.com/NibVnBLb) posted in previous answer did not work for us. In our web portal, embedding Oracle BI dashboard using BridgeServlet above, redirected us to OBI login page and console log showed incorrect resources(js/css) web links (local URL instead of OBIEE URL's).
Instead we used this class (with some minor adjustments) https://gist.github.com/rafaeltuelho/9376341#file-obieehttpservletbridge-java.
Tested with Java 11, Oracle Business Intelligence 12.2.1.4.0, WSDL v12 of OBIEE (http://OBIEE-server:port/analytics/saw.dll/wsdl/v12), with SSO disabled.
Here it is the BridgeServlet class:
package com.abs.bi;
import java.io.DataOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.UnsupportedEncodingException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLEncoder;
import java.util.Enumeration;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
import javax.servlet.ServletException;
import javax.servlet.http.Cookie;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
/**
* Servlet implementation class OBIEEBridge
*/
public class BridgeServlet extends HttpServlet {
private static final long serialVersionUID = 1L;
/**
* #see HttpServlet#HttpServlet()
*/
public BridgeServlet() {
super();
}
/**
* #see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse
* response)
*/
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
try {
this.processRequest(request, response);
} catch (Exception e) {
throw new ServletException(e);
}
}
/**
* #see HttpServlet#doPost(HttpServletRequest request, HttpServletResponse
* response)
*/
protected void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
try {
this.processRequest(request, response);
} catch (Exception e) {
throw new ServletException(e);
}
}
protected void processRequest(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
HttpURLConnection urlCon = forwardRequest(request);
forwardResponse(response, urlCon);
}
#SuppressWarnings("unchecked")
private String decodeURL(HttpServletRequest request) {
StringBuffer bufURL = new StringBuffer("");
Map<String, String[]> params = request.getParameterMap();
String[] arrURL = params.get("RedirectURL");
String strURL = arrURL == null || arrURL.length == 0 ? null : arrURL[0];
bufURL.append(strURL);
int nQIndex = strURL.lastIndexOf('?');
if (params != null && !params.isEmpty()) {
bufURL.append(((nQIndex >= 0) ? "&" : "?"));
Set<String> keys = params.keySet();
Iterator<String> it = keys.iterator();
while (it.hasNext()) {
try {
String strKey = it.next();
if (strKey.equalsIgnoreCase("RedirectURL")) {
continue;
}
String strEncodedKey = URLEncoder.encode(strKey, "UTF-8");
String[] paramValues = params.get(strKey);
for (String paramValue : paramValues) {
bufURL.append(strEncodedKey);
bufURL.append("=");
bufURL.append(URLEncoder.encode(paramValue, "UTF-8"));
bufURL.append("&");
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
bufURL.deleteCharAt(bufURL.length() - 1);
}
return bufURL.toString();
}
#SuppressWarnings("unchecked")
private HttpURLConnection forwardRequest(HttpServletRequest request) throws IOException {
String strURL = decodeURL(request);
String[] arrURL = strURL.split("&", 2);
String baseURL = arrURL[0];
URL url = new URL(baseURL);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
String strMethod = request.getMethod();
con.setRequestMethod(strMethod);
Enumeration<String> en = request.getHeaderNames();
String strHeader;
while (en.hasMoreElements()) {
strHeader = en.nextElement();
String strHeaderValue = request.getHeader(strHeader);
con.setRequestProperty(strHeader, strHeaderValue);
}
// is not a HTTP GET request
if (strMethod.compareTo("GET") != 0) {
con.setDoOutput(true);
con.setDoInput(true);
con.setUseCaches(false);
DataOutputStream forwardStream = new DataOutputStream(con.getOutputStream());
try {
String urlParameters = arrURL[1];
forwardStream.writeBytes(urlParameters);
forwardStream.flush();
} finally {
forwardStream.close();
}
}
return con;
}
private void forwardResponse(HttpServletResponse response, HttpURLConnection con) throws IOException {
int nContentLen = -1;
String strKey;
String strValue;
try {
response.setStatus(con.getResponseCode());
for (int i = 1; true; ++i) {
strKey = con.getHeaderFieldKey(i);
strValue = con.getHeaderField(i);
if (strKey == null) {
break;
}
if (strKey.equals("Content-Length")) {
nContentLen = Integer.parseInt(con.getHeaderField(i));
continue;
}
if (strKey.equalsIgnoreCase("Connection") || strKey.equalsIgnoreCase("Server")
|| strKey.equalsIgnoreCase("Transfer-Encoding") || strKey.equalsIgnoreCase("Content-Length")) {
continue; // skip certain headers
}
if (strKey.equalsIgnoreCase("Set-Cookie")) {
String[] cookieStr1 = strValue.split(";");
String[] cookieStr2 = cookieStr1[0].split("=");
// String[] cookieStr3 = cookieStr1[1].split("=");
/*
* Change the Set-Cookie HTTP Header to remove the 'path' attribute. Thus the
* browser can accept the ORA_BIPS_NQID cookie from Oracle BI Server
*/
Cookie c = new Cookie(cookieStr2[0], cookieStr2[1]);
c.setPath("/");
response.addCookie(c);
} else {
response.setHeader(strKey, strValue);
}
}
copyStreams(con.getInputStream(), response.getOutputStream(), nContentLen);
} finally {
response.getOutputStream().close();
con.getInputStream().close();
}
}
private void copyStreams(InputStream inputStream, OutputStream forwardStream, int nContentLen) throws IOException {
byte[] buf = new byte[1024];
int nCount = 0;
int nBytesToRead = 1024;
int nTotalCount = 0;
do {
if (nContentLen != -1)
nBytesToRead = nContentLen - nTotalCount > 1024 ? 1024 : nContentLen - nTotalCount;
if (nBytesToRead == 0)
break;
// try to read some bytes from src stream
nCount = inputStream.read(buf, 0, nBytesToRead);
if (nCount < 0)
break;
nTotalCount += nCount;
// try to write some bytes in target stream
forwardStream.write(buf, 0, nCount);
} while (true);
}
}
AbsServiceUtils which contains SAWSessionService and HtmlViewService web service calls to Oracle BI server:
package com.abs.bi;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.Locale;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.jsp.JspWriter;
import oracle.bi.web.soap.AuthResult;
import oracle.bi.web.soap.HtmlViewService;
import oracle.bi.web.soap.HtmlViewServiceSoap;
import oracle.bi.web.soap.ReportHTMLLinksMode;
import oracle.bi.web.soap.ReportHTMLOptions;
import oracle.bi.web.soap.ReportRef;
import oracle.bi.web.soap.SAWLocale;
import oracle.bi.web.soap.SAWSessionParameters;
import oracle.bi.web.soap.SAWSessionService;
import oracle.bi.web.soap.SAWSessionServiceSoap;
import oracle.bi.web.soap.StartPageParams;
public class AbsServiceUtils {
private AbsServiceUtils() {
}
public static URL buildWsdlUrl() throws MalformedURLException {
return new URL(IAbsService.BASEWSDLURL);
}
public static void writeBiContent(HttpServletRequest request, JspWriter out, String biReport) throws IOException {
String userAgent = request.getHeader("User-Agent");
Locale userLocale = request.getLocale();
String bridgeServletContextPath = request.getContextPath() + "/bridgeservlet";
String reportHtml = writeBiContent(biReport, userAgent, userLocale, bridgeServletContextPath);
if (out != null) {
out.println(reportHtml);
}
}
public static String writeBiContent(String biReport, String userAgent, Locale userLocale,
String bridgeServletContextPath) throws MalformedURLException {
HtmlViewService htmlViewService = new HtmlViewService(buildWsdlUrl());
HtmlViewServiceSoap htmlClient = htmlViewService.getHtmlViewService();
SAWSessionService sAWSessionService = new SAWSessionService(buildWsdlUrl());
SAWSessionServiceSoap myPort = sAWSessionService.getSAWSessionServiceSoap();
SAWSessionParameters sessionparams = new SAWSessionParameters();
sessionparams.setUserAgent(userAgent);
SAWLocale sawlocale = new SAWLocale();
sawlocale.setLanguage(userLocale.getLanguage());
sawlocale.setCountry(userLocale.getCountry());
sessionparams.setLocale(sawlocale);
sessionparams.setAsyncLogon(false);
AuthResult result = myPort.logonex(IAbsService.BIUSERNAME, IAbsService.BIPASSWORD, sessionparams);
String sessionID = result.getSessionID();
List<String> keepAliveSessionList = new ArrayList<>(1);
keepAliveSessionList.add(sessionID);
myPort.keepAlive(keepAliveSessionList);
StartPageParams spparams = new StartPageParams();
spparams.setDontUseHttpCookies(true);
String pageID = htmlClient.startPage(spparams, sessionID);
/**
* This method will set the path to the servlet which will act like a bridge to
* retrieve all the OBIEE resources like the javascript, CSS and the report.
*/
if (bridgeServletContextPath != null) {
htmlClient.setBridge(bridgeServletContextPath, sessionID);
}
ReportHTMLOptions htmlOptions = new ReportHTMLOptions();
htmlOptions.setEnableDelayLoading(false);
htmlOptions.setLinkMode(ReportHTMLLinksMode.IN_PLACE.value());
ReportRef reportref = new ReportRef();
reportref.setReportPath(IAbsService.BIROOTPATH + biReport);
StartPageParams startpageparams = new StartPageParams();
startpageparams.setDontUseHttpCookies(false);
htmlClient.addReportToPage(pageID, biReport.replace(" ", ""), reportref, null, null, htmlOptions, sessionID);
String reportHtml = htmlClient.getHeadersHtml(pageID, sessionID);
reportHtml = reportHtml + htmlClient.getHtmlForReport(pageID, biReport.replace(" ", ""), sessionID);
reportHtml = reportHtml + htmlClient.getCommonBodyHtml(pageID, sessionID);
return reportHtml;
}
}
IAbsService:
package com.abs.bi;
public interface IAbsService {
public static final String BASEWSDLURL = "http://<OracleBIServer:port>/analytics/saw.dll/wsdl/v12";
public static final String BIUSERNAME = "USER";
public static final String BIPASSWORD = "PASS";
public static final String BIROOTPATH = "/shared/sharedfolder/";
public static final String BIREPORTNAME = "report";
}
Use the link http://pastebin.com/NibVnBLb to check out the bridge code for java. Hope this might be helful.

crawler for gwt application taking too much time

i have a gwt application that i need to optimize for seo ( crawl the content for google), and i have been trying many solutions wich are not meeting our needs (it's taking us a big amount of time to return the html page), the trials are:
I tried to use htmlUnit as headless browser to crawl the page on demand, it takes about 15 second to get the html content (when auditing this timing, it results that 80% of this timing is taken by a loop that waits for background javascript "while (waitForBackgroundJavaScript > 0 && loopCount < _maxLoopChecks) ")
A technic that consists on crawling the page prior to google request, then giving the saved snapshot when google is asking for it (but this solution is definitely not convenient because the content changes very frequently and google may consider this as a "CLOACKING")
Any suggestion?
the code used to crawl:
public class CrawlFilter implements Filter {
private class SyncAllAjaxController extends NicelyResynchronizingAjaxController {
private static final long serialVersionUID = 1L;
#Override
public boolean processSynchron(HtmlPage page, WebRequest request, boolean async) {
return true;
}
}
private final Logger log = Logger.getLogger(CrawlFilter.class.getName());
/**
* Special URL token that gets passed from the crawler to the servlet
* filter. This token is used in case there are already existing query
* parameters.
*/
private static final String ESCAPED_FRAGMENT_FORMAT1 = "_escaped_fragment_=";
private static final int ESCAPED_FRAGMENT_LENGTH1 = ESCAPED_FRAGMENT_FORMAT1.length();
/**
* Special URL token that gets passed from the crawler to the servlet
* filter. This token is used in case there are not already existing query
* parameters.
*/
private static final String ESCAPED_FRAGMENT_FORMAT2 = "&" + ESCAPED_FRAGMENT_FORMAT1;
private static final int ESCAPED_FRAGMENT_LENGTH2 = ESCAPED_FRAGMENT_FORMAT2.length();
private static final long _pumpEventLoopTimeoutMillis = 30000;
private static final long _jsTimeoutMillis = 1000;
private static final long _pageWaitMillis = 200;
private static final int _maxLoopChecks = 2;
private WebClient webClient;
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain filterChain) throws IOException, ServletException {
// Grab the request uri and query strings.
final HttpServletRequest httpRequest = (HttpServletRequest) request;
final String requestURI = httpRequest.getRequestURI();
final String queryString = httpRequest.getQueryString();
final HttpServletResponse httpResponse = (HttpServletResponse) response;
if ((queryString != null) && (queryString.contains(ESCAPED_FRAGMENT_FORMAT1))) {
final int port = httpRequest.getServerPort();
final String urlStringWithHashFragment = requestURI + rewriteQueryString(queryString);
final String scheme = httpRequest.getScheme();
final URL urlWithHashFragment = new URL(scheme, "127.0.0.1", port, urlStringWithHashFragment);
final WebRequest webRequest = new WebRequest(urlWithHashFragment);
log.fine("Crawl filter encountered escaped fragment, will open: " + webRequest.toString());
httpResponse.setContentType("text/html;charset=UTF-8");
final PrintWriter out = httpResponse.getWriter();
out.println(renderPage(webRequest));
out.flush();
out.close();
log.fine("HtmlUnit completed webClient.getPage(webRequest) where webRequest = " + webRequest.toString());
} else {
filterChain.doFilter(request, response);
}
}
#Override
public void destroy() {
if (webClient != null) {
webClient.closeAllWindows();
}
}
#Override
public void init(FilterConfig config) throws ServletException {
}
private StringBuilder renderPage(WebRequest webRequest) throws IOException {
webClient = new WebClient(BrowserVersion.FIREFOX_17);
webClient.getCache().clear();
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setRedirectEnabled(false);
webClient.setAjaxController(new SyncAllAjaxController());
webClient.setCssErrorHandler(new SilentCssErrorHandler());
final HtmlPage page = webClient.getPage(webRequest);
webClient.getJavaScriptEngine().pumpEventLoop(_pumpEventLoopTimeoutMillis);
int waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(_jsTimeoutMillis);
int loopCount = 0;
while (waitForBackgroundJavaScript > 0 && loopCount < _maxLoopChecks) {
++loopCount;
waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(_jsTimeoutMillis);
if (waitForBackgroundJavaScript == 0) {
log.fine("HtmlUnit exits background javascript at loop counter " + loopCount);
break;
}
synchronized (page) {
log.fine("HtmlUnit waits for background javascript at loop counter " + loopCount);
try {
page.wait(_pageWaitMillis);
} catch (InterruptedException e) {
log.log(Level.SEVERE, "HtmlUnit ERROR on page.wait at loop counter " + loopCount, e);
}
}
}
webClient.getAjaxController().processSynchron(page, webRequest, false);
if (webClient.getJavaScriptEngine().isScriptRunning()) {
log.warning("HtmlUnit webClient.getJavaScriptEngine().shutdownJavaScriptExecutor()");
webClient.getJavaScriptEngine().shutdownJavaScriptExecutor();
}
final String staticSnapshotHtml = page.asXml();
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("<hr />\n");
stringBuilder.append("<center><h3>This is a non-interactive snapshot for crawlers. Follow <a href=\"");
stringBuilder.append(webRequest.getUrl() + "\">this link</a> for the interactive application.<br></h3></center>");
stringBuilder.append("<hr />");
stringBuilder.append(staticSnapshotHtml);
return stringBuilder;
}
/**
* Maps from the query string that contains _escaped_fragment_ to one that
* doesn't, but is instead followed by a hash fragment. It also unescapes any
* characters that were escaped by the crawler. If the query string does not
* contain _escaped_fragment_, it is not modified.
*
* #param queryString
* #return A modified query string followed by a hash fragment if applicable.
* The non-modified query string otherwise.
* #throws UnsupportedEncodingException
*/
private static String rewriteQueryString(String queryString) throws UnsupportedEncodingException {
int index = queryString.indexOf(ESCAPED_FRAGMENT_FORMAT2);
int length = ESCAPED_FRAGMENT_LENGTH2;
if (index == -1) {
index = queryString.indexOf(ESCAPED_FRAGMENT_FORMAT1);
length = ESCAPED_FRAGMENT_LENGTH1;
}
if (index != -1) {
StringBuilder queryStringSb = new StringBuilder();
if (index > 0) {
queryStringSb.append("?");
queryStringSb.append(queryString.substring(0, index));
}
queryStringSb.append("#!");
queryStringSb.append(URLDecoder.decode(queryString.substring(index
+ length, queryString.length()), "UTF-8"));
return queryStringSb.toString();
}
return queryString;
}
}
I suggest having HtmlUnit generate the static html offline. You control the update frequency.
Then, have your servlet filter intercepting the crawler request return the already generated static html.

Use servlet filter to remove a form parameter from posted data

A vendor has been posting XML data over HTTPS within a form variable named XMLContent to my Coldfusion application server. I recently moved to a more recent version of the application server and those requests are throwing 500 server errors. It is throwing the error because a second form parameter's content is not properly urlencoded, but I don't need that parameter anyway. (I contacted the vendor to fix this but they are forcing me to pay to fix their mistake so I'm looking to fix it myself if possible.)
How would I utilize a servlet filter to remove all but the form parameter named: XMLContent
I have tried various attempts to explicitly remove the offending parameter "TContent" but it never gets removed.
A snippet of the data being received:
XMLContent=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22UTF-8%22%3F%3E%0A%3CCheck+xmlns%3D%22http .........&TContent=<!--?xml version="1.0" encoding="UTF-8"?--><check xmlns="http...........
The code I've tried:
import java.io.IOException;
import java.util.Collections;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletRequestWrapper;
import java.util.*;
public class MultipartFilter implements Filter {
// Init ----------------------------------------------------------------
public FilterConfig filterConfig;
// Actions -------------------------------------------------------------
public void init(FilterConfig filterConfig) throws ServletException {
this.filterConfig = filterConfig;
}
/**
* Check the type request and if it is a HttpServletRequest, then parse the request.
* #throws ServletException If parsing of the given HttpServletRequest fails.
* #see javax.servlet.Filter#doFilter(
* javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain)
*/
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws ServletException, IOException
{
// Check type request.
if (request instanceof HttpServletRequest) {
// Cast back to HttpServletRequest.
HttpServletRequest httpRequest = (HttpServletRequest) request;
// Parse HttpServletRequest.
HttpServletRequest parsedRequest = parseRequest(httpRequest);
// Continue with filter chain.
chain.doFilter(parsedRequest, response);
} else {
// Not a HttpServletRequest.
chain.doFilter(request, response);
}
}
/**
* #see javax.servlet.Filter#destroy()
*/
public void destroy() {
this.filterConfig = null;
}
private HttpServletRequest parseRequest(HttpServletRequest request) throws ServletException {
// Prepare the request parameter map.
Map<String, String[]> parameterMap = new HashMap<String, String[]>();
// Loop through form parameters.
Enumeration<String> parameterNames = request.getParameterNames();
while (parameterNames.hasMoreElements()) {
String paramName = parameterNames.nextElement();
String[] paramValues = request.getParameterValues(paramName);
// Add just the XMLContent form parameter
if (paramName.equalsIgnoreCase("xmlcontent")) {
parameterMap.put(paramName, new String[] { paramValues[0] });
}
}
// Wrap the request with the parameter map which we just created and return it.
return wrapRequest(request, parameterMap);
}
// Utility (may be refactored to public utility class) ---------------
/**
* Wrap the given HttpServletRequest with the given parameterMap.
* #param request The HttpServletRequest of which the given parameterMap have to be wrapped in.
* #param parameterMap The parameterMap to be wrapped in the given HttpServletRequest.
* #return The HttpServletRequest with the parameterMap wrapped in.
*/
private static HttpServletRequest wrapRequest(
HttpServletRequest request, final Map<String, String[]> parameterMap)
{
return new HttpServletRequestWrapper(request) {
public Map<String, String[]> getParameterMap() {
return parameterMap;
}
public String[] getParameterValues(String name) {
return parameterMap.get(name);
}
public String getParameter(String name) {
String[] params = getParameterValues(name);
return params != null && params.length > 0 ? params[0] : null;
}
public Enumeration<String> getParameterNames() {
return Collections.enumeration(parameterMap.keySet());
}
};
}
}
We face these situations every day while handling locale. If user locale is different than browser locale, we have to update the request. But its not mutable.
Solution: create a request wrapper class. Copy existing request parameters (the ones you want) into it and pass this wrapper class in filterchain.doFilter()
sample wrapper class:
public class WrappedRequest extends HttpServletRequestWrapper
{
Map<String, String[]> parameterMap;
public WrappedRequest(final HttpServletRequest request)
{
//build your param Map here with required values
}
#Override
public Map<String, String[]> getParameterMap()
{
//return local param map
}
//override other methods if needed.
}
Now in your filter code, do following.
wrapRequest = new WrappedRequest(hreq);
filterChain.doFilter(wrapRequest, servletResponse);
Hopefully it will solve your problem.
Approach
The code follows the correct approach:
in wrapRequest(), it instantiates HttpServletRequestWrapper and overrides the 4 methods that trigger request parsing:
public String getParameter(String name)
public Map<String, String[]> getParameterMap()
public Enumeration<String> getParameterNames()
public String[] getParameterValues(String name)
the doFilter() method invokes the filter chain using the wrapped request, meaning subsequent filters, plus the target servlet (URL-mapped) will be supplied the wrapped request.
Problem
Calling any of the 4 methods getParameterXXX() methods on the underlying 'original request' triggers implicit parsing of all request parameters (via a server internal method such as Request.parseRequestParameters or parsePameters). These 4 methods are the only way to trigger such parsing.
Before the request is wrapped, in parseRequest(), your code calls such methods on the underlying request:
request.getParameterNames();
request.getParameterValues(paramName);
Solution
You need to control the parsing logic. Reading raw bytes from the input stream and doing your own URL decoding is too complex - it would mean replacing a large volume of tightly-coupled server code. Instead, the simplest approach is to just replace the method that does the actual URL decoding. That means you can leave in place the request.getParameterXXX calls mentioned in prev section.
Not sure what server you're hosting ColdFusion on, but following description is based on Glassfish/Tomcat, but can be adapted. At the bottom of this post is the internal request parsing method from Glassfish (a modified version of Tomcat). JD-decompiler is useful for this tinkering for converting .class files to .java.
Find the class that does URL decoding (for Glassfish this is com.sun.grizzly.util.http.Parameters from grizzly-utils.jar as shown below, for Tomcat this is org.apache.tomcat.util.http.Parameters from tomcat-coyote.jar)
Copy the entire source code to a new class somepackage.StrippedParameters
Find the line of code that decodes the parameter value (below: value = urlDecode(this.tmpValue))
Change this code so that it only decodes when the parameter name matches your desired parameter:
if (decodeName)
name = urlDecode(this.tmpName);
else
name = this.tmpName.toString();
//
// !! THIS IF STATEMENT ADDED TO ONLY DECODE DESIRED PARAMETERS,
// !! OTHERS ARE STRIPPED:
//
if ("XMLContent".equals(name)) {
String value;
String value;
if (decodeValue)
value = urlDecode(this.tmpValue);
else {
value = this.tmpValue.toString();
}
try
{
addParameter(name, value);
}
catch (IllegalStateException ise)
{
logger.warning(ise.getMessage());
break;
}
}
Now the tricky part: replace the default class Parameters with your class StrippedParmeters just before URL decoding occurs. Once the parameters are obtained, copy them back to the container class. For Glassfish copy the method parseRequestParameters into your implementation of HttpServletRequestWrapper (for Tomcat, the corresponding method is parseParameters in class org.apache.catalina.connector.Request in catalina.jar)
replace this line:
Parameters parameters = this.coyoteRequest.getParameters();
with:
Parameters parameters = new somepackage.StrippedParameters();
at the bottom of the method, add this:
Parameters coyoteParameters = this.coyoteRequest.getParameters();
for (String paramName : parameters.getParameterNames()) {
String paramValue = parameters.getParameterValue(paramName);
coyoteParameters.addParameter(paramName, paramValue);
}
Glassfish Container Code - Modified Version of Tomcat, with Grizzly replacing Coyote
(Catalina Servlet Engine still refers to Coyote objects, but they are Grizzly instances mocked like Coyote)
package org.apache.catalina.connector;
....
public class Request implements HttpRequest, HttpServletRequest {
....
protected com.sun.grizzly.tcp.Request coyoteRequest;
....
// This is called from the 4 methods named 'getParameterXXX'
protected void parseRequestParameters() {
Parameters parameters = this.coyoteRequest.getParameters();
parameters.setLimit(getConnector().getMaxParameterCount());
String enc = getCharacterEncoding();
this.requestParametersParsed = true;
if (enc != null) {
parameters.setEncoding(enc);
parameters.setQueryStringEncoding(enc);
} else {
parameters.setEncoding("ISO-8859-1");
parameters.setQueryStringEncoding("ISO-8859-1");
}
parameters.handleQueryParameters();
if ((this.usingInputStream) || (this.usingReader)) {
return;
}
if (!getMethod().equalsIgnoreCase("POST")) {
return;
}
String contentType = getContentType();
if (contentType == null) {
contentType = "";
}
int semicolon = contentType.indexOf(';');
if (semicolon >= 0)
contentType = contentType.substring(0, semicolon).trim();
else {
contentType = contentType.trim();
}
if ((isMultipartConfigured()) && ("multipart/form-data".equals(contentType))) {
getMultipart().init();
}
if (!"application/x-www-form-urlencoded".equals(contentType)) {
return;
}
int len = getContentLength();
if (len > 0) {
int maxPostSize = ((Connector)this.connector).getMaxPostSize();
if ((maxPostSize > 0) && (len > maxPostSize)) {
log(sm.getString("coyoteRequest.postTooLarge"));
throw new IllegalStateException("Post too large");
}
try {
byte[] formData = getPostBody();
if (formData != null)
parameters.processParameters(formData, 0, len);
} catch (Throwable t) {
}
}
}
}
package com.sun.grizzly.tcp;
import com.sun.grizzly.util.http.Parameters;
public class Request {
....
private Parameters parameters = new Parameters();
....
public Parameters getParameters() {
return this.parameters;
}
}
package com.sun.grizzly.util.http;
public class Parameters {
....
public void processParameters(byte[] bytes, int start, int len) {
processParameters(bytes, start, len, getCharset(this.encoding));
}
public void processParameters(byte[] bytes, int start, int len, Charset charset) {
if (debug > 0) {
try {
log(sm.getString("parameters.bytes", new String(bytes, start, len, "ISO-8859-1")));
} catch (UnsupportedEncodingException e) {
logger.log(Level.SEVERE, sm.getString("parameters.convertBytesFail"), e);
}
}
int decodeFailCount = 0;
int end = start + len;
int pos = start;
while (pos < end) {
int nameStart = pos;
int nameEnd = -1;
int valueStart = -1;
int valueEnd = -1;
boolean parsingName = true;
boolean decodeName = false;
boolean decodeValue = false;
boolean parameterComplete = false;
do {
switch (bytes[pos]) {
case 61:
if (parsingName) {
nameEnd = pos;
parsingName = false;
pos++; valueStart = pos;
} else {
pos++;
}
break;
case 38:
if (parsingName) {
nameEnd = pos;
} else {
valueEnd = pos;
}
parameterComplete = true;
pos++;
break;
case 37:
case 43:
if (parsingName)
decodeName = true;
else {
decodeValue = true;
}
pos++;
break;
default:
pos++;
}
} while ((!parameterComplete) && (pos < end));
if (pos == end) {
if (nameEnd == -1)
nameEnd = pos;
else if ((valueStart > -1) && (valueEnd == -1)) {
valueEnd = pos;
}
}
if ((debug > 0) && (valueStart == -1)) {
try {
log(sm.getString("parameters.noequal", Integer.valueOf(nameStart),
Integer.valueOf(nameEnd),
new String(bytes, nameStart, nameEnd - nameStart, "ISO-8859-1")));
} catch (UnsupportedEncodingException e) {
logger.log(Level.SEVERE, sm.getString("parameters.convertBytesFail"), e);
}
}
if (nameEnd <= nameStart) {
if (logger.isLoggable(Level.INFO)) {
if (valueEnd >= nameStart)
try {
new String(bytes, nameStart, valueEnd - nameStart, "ISO-8859-1");
} catch (UnsupportedEncodingException e) {
logger.log(Level.SEVERE,
sm.getString("parameters.convertBytesFail"), e);
} else {
logger.fine(sm.getString("parameters.invalidChunk",
Integer.valueOf(nameStart), Integer.valueOf(nameEnd), null));
}
}
} else {
this.tmpName.setCharset(charset);
this.tmpValue.setCharset(charset);
this.tmpName.setBytes(bytes, nameStart, nameEnd - nameStart);
this.tmpValue.setBytes(bytes, valueStart, valueEnd - valueStart);
if (debug > 0)
try {
this.origName.append(bytes, nameStart, nameEnd - nameStart);
this.origValue.append(bytes, valueStart, valueEnd - valueStart);
}
catch (IOException ioe) {
logger.log(Level.SEVERE, sm.getString("parameters.copyFail"), ioe);
}
try
{
String name;
String name;
if (decodeName)
name = urlDecode(this.tmpName);
else
name = this.tmpName.toString();
String value;
String value;
if (decodeValue)
value = urlDecode(this.tmpValue);
else {
value = this.tmpValue.toString();
}
try
{
addParameter(name, value);
}
catch (IllegalStateException ise)
{
logger.warning(ise.getMessage());
break;
}
} catch (IOException e) {
decodeFailCount++;
if ((decodeFailCount == 1) || (debug > 0)) {
if (debug > 0) {
log(sm.getString("parameters.decodeFail.debug", this.origName.toString(), this.origValue.toString()), e);
}
else if (logger.isLoggable(Level.INFO)) {
logger.log(Level.INFO, sm.getString("parameters.decodeFail.info", this.tmpName.toString(), this.tmpValue.toString()), e);
}
}
}
this.tmpName.recycle();
this.tmpValue.recycle();
if (debug > 0) {
this.origName.recycle();
this.origValue.recycle();
}
}
}
if ((decodeFailCount > 1) && (debug <= 0))
logger.info(sm.getString("parameters.multipleDecodingFail", Integer.valueOf(decodeFailCount)));
}

How to download videos from youtube on java? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
how to download videos from youtube on Java?
need class(or piece of code) which describes how to do that.
Thank you.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.StringWriter;
import java.io.UnsupportedEncodingException;
import java.io.Writer;
import java.net.URI;
import java.net.URISyntaxException;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import java.util.logging.Formatter;
import java.util.logging.Handler;
import java.util.logging.Level;
import java.util.logging.LogRecord;
import java.util.logging.Logger;
import java.util.regex.Pattern;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.CookieStore;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.protocol.ClientContext;
import org.apache.http.client.utils.URIUtils;
import org.apache.http.client.utils.URLEncodedUtils;
import org.apache.http.impl.client.BasicCookieStore;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.http.protocol.HttpContext;
public class JavaYoutubeDownloader {
public static String newline = System.getProperty("line.separator");
private static final Logger log = Logger.getLogger(JavaYoutubeDownloader.class.getCanonicalName());
private static final Level defaultLogLevelSelf = Level.FINER;
private static final Level defaultLogLevel = Level.WARNING;
private static final Logger rootlog = Logger.getLogger("");
private static final String scheme = "http";
private static final String host = "www.youtube.com";
private static final Pattern commaPattern = Pattern.compile(",");
private static final Pattern pipePattern = Pattern.compile("\\|");
private static final char[] ILLEGAL_FILENAME_CHARACTERS = { '/', '\n', '\r', '\t', '\0', '\f', '`', '?', '*', '\\', '<', '>', '|', '\"', ':' };
private static void usage(String error) {
if (error != null) {
System.err.println("Error: " + error);
}
System.err.println("usage: JavaYoutubeDownload VIDEO_ID DESTINATION_DIRECTORY");
System.exit(-1);
}
public static void main(String[] args) {
if (args == null || args.length == 0) {
usage("Missing video id. Extract from http://www.youtube.com/watch?v=VIDEO_ID");
}
try {
setupLogging();
log.fine("Starting");
String videoId = null;
String outdir = ".";
// TODO Ghetto command line parsing
if (args.length == 1) {
videoId = args[0];
} else if (args.length == 2) {
videoId = args[0];
outdir = args[1];
}
int format = 18; // http://en.wikipedia.org/wiki/YouTube#Quality_and_codecs
String encoding = "UTF-8";
String userAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
File outputDir = new File(outdir);
String extension = getExtension(format);
play(videoId, format, encoding, userAgent, outputDir, extension);
} catch (Throwable t) {
t.printStackTrace();
}
log.fine("Finished");
}
private static String getExtension(int format) {
// TODO
return "mp4";
}
private static void play(String videoId, int format, String encoding, String userAgent, File outputdir, String extension) throws Throwable {
log.fine("Retrieving " + videoId);
List<NameValuePair> qparams = new ArrayList<NameValuePair>();
qparams.add(new BasicNameValuePair("video_id", videoId));
qparams.add(new BasicNameValuePair("fmt", "" + format));
URI uri = getUri("get_video_info", qparams);
CookieStore cookieStore = new BasicCookieStore();
HttpContext localContext = new BasicHttpContext();
localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
HttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet(uri);
httpget.setHeader("User-Agent", userAgent);
log.finer("Executing " + uri);
HttpResponse response = httpclient.execute(httpget, localContext);
HttpEntity entity = response.getEntity();
if (entity != null && response.getStatusLine().getStatusCode() == 200) {
InputStream instream = entity.getContent();
String videoInfo = getStringFromInputStream(encoding, instream);
if (videoInfo != null && videoInfo.length() > 0) {
List<NameValuePair> infoMap = new ArrayList<NameValuePair>();
URLEncodedUtils.parse(infoMap, new Scanner(videoInfo), encoding);
String token = null;
String downloadUrl = null;
String filename = videoId;
for (NameValuePair pair : infoMap) {
String key = pair.getName();
String val = pair.getValue();
log.finest(key + "=" + val);
if (key.equals("token")) {
token = val;
} else if (key.equals("title")) {
filename = val;
} else if (key.equals("fmt_url_map")) {
String[] formats = commaPattern.split(val);
for (String fmt : formats) {
String[] fmtPieces = pipePattern.split(fmt);
if (fmtPieces.length == 2) {
// in the end, download somethin!
downloadUrl = fmtPieces[1];
int pieceFormat = Integer.parseInt(fmtPieces[0]);
if (pieceFormat == format) {
// found what we want
downloadUrl = fmtPieces[1];
break;
}
}
}
}
}
filename = cleanFilename(filename);
if (filename.length() == 0) {
filename = videoId;
} else {
filename += "_" + videoId;
}
filename += "." + extension;
File outputfile = new File(outputdir, filename);
if (downloadUrl != null) {
downloadWithHttpClient(userAgent, downloadUrl, outputfile);
}
}
}
}
private static void downloadWithHttpClient(String userAgent, String downloadUrl, File outputfile) throws Throwable {
HttpGet httpget2 = new HttpGet(downloadUrl);
httpget2.setHeader("User-Agent", userAgent);
log.finer("Executing " + httpget2.getURI());
HttpClient httpclient2 = new DefaultHttpClient();
HttpResponse response2 = httpclient2.execute(httpget2);
HttpEntity entity2 = response2.getEntity();
if (entity2 != null && response2.getStatusLine().getStatusCode() == 200) {
long length = entity2.getContentLength();
InputStream instream2 = entity2.getContent();
log.finer("Writing " + length + " bytes to " + outputfile);
if (outputfile.exists()) {
outputfile.delete();
}
FileOutputStream outstream = new FileOutputStream(outputfile);
try {
byte[] buffer = new byte[2048];
int count = -1;
while ((count = instream2.read(buffer)) != -1) {
outstream.write(buffer, 0, count);
}
outstream.flush();
} finally {
outstream.close();
}
}
}
private static String cleanFilename(String filename) {
for (char c : ILLEGAL_FILENAME_CHARACTERS) {
filename = filename.replace(c, '_');
}
return filename;
}
private static URI getUri(String path, List<NameValuePair> qparams) throws URISyntaxException {
URI uri = URIUtils.createURI(scheme, host, -1, "/" + path, URLEncodedUtils.format(qparams, "UTF-8"), null);
return uri;
}
private static void setupLogging() {
changeFormatter(new Formatter() {
#Override
public String format(LogRecord arg0) {
return arg0.getMessage() + newline;
}
});
explicitlySetAllLogging(Level.FINER);
}
private static void changeFormatter(Formatter formatter) {
Handler[] handlers = rootlog.getHandlers();
for (Handler handler : handlers) {
handler.setFormatter(formatter);
}
}
private static void explicitlySetAllLogging(Level level) {
rootlog.setLevel(Level.ALL);
for (Handler handler : rootlog.getHandlers()) {
handler.setLevel(defaultLogLevelSelf);
}
log.setLevel(level);
rootlog.setLevel(defaultLogLevel);
}
private static String getStringFromInputStream(String encoding, InputStream instream) throws UnsupportedEncodingException, IOException {
Writer writer = new StringWriter();
char[] buffer = new char[1024];
try {
Reader reader = new BufferedReader(new InputStreamReader(instream, encoding));
int n;
while ((n = reader.read(buffer)) != -1) {
writer.write(buffer, 0, n);
}
} finally {
instream.close();
}
String result = writer.toString();
return result;
}
}
/**
* <pre>
* Exploded results from get_video_info:
*
* fexp=90...
* allow_embed=1
* fmt_stream_map=35|http://v9.lscache8...
* fmt_url_map=35|http://v9.lscache8...
* allow_ratings=1
* keywords=Stefan Molyneux,Luke Bessey,anarchy,stateless society,giant stone cow,the story of our unenslavement,market anarchy,voluntaryism,anarcho capitalism
* track_embed=0
* fmt_list=35/854x480/9/0/115,34/640x360/9/0/115,18/640x360/9/0/115,5/320x240/7/0/0
* author=lukebessey
* muted=0
* length_seconds=390
* plid=AA...
* ftoken=null
* status=ok
* watermark=http://s.ytimg.com/yt/swf/logo-vfl_bP6ud.swf,http://s.ytimg.com/yt/swf/hdlogo-vfloR6wva.swf
* timestamp=12...
* has_cc=False
* fmt_map=35/854x480/9/0/115,34/640x360/9/0/115,18/640x360/9/0/115,5/320x240/7/0/0
* leanback_module=http://s.ytimg.com/yt/swfbin/leanback_module-vflJYyeZN.swf
* hl=en_US
* endscreen_module=http://s.ytimg.com/yt/swfbin/endscreen-vflk19iTq.swf
* vq=auto
* avg_rating=5.0
* video_id=S6IZP3yRJ9I
* token=vPpcFNh...
* thumbnail_url=http://i4.ytimg.com/vi/S6IZP3yRJ9I/default.jpg
* title=The Story of Our Unenslavement - Animated
* </pre>
*/
I know i am answering late. But this code may useful for some one. So i am attaching it here.
Use the following java code to download the videos from YouTube.
package com.mycompany.ytd;
import java.io.File;
import java.net.URL;
import com.github.axet.vget.VGet;
/**
*
* #author Manindar
*/
public class YTD {
public static void main(String[] args) {
try {
String url = "https://www.youtube.com/watch?v=s10ARdfQUOY";
String path = "D:\\Manindar\\YTD\\";
VGet v = new VGet(new URL(url), new File(path));
v.download();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
Add the below Dependency in your POM.XML file
<dependency>
<groupId>com.github.axet</groupId>
<artifactId>vget</artifactId>
<version>1.1.33</version>
</dependency>
Hope this will be useful.
Regarding the format (mp4 or flv) decide which URL you want to use. Then use this tutorial to download the video and save it into a local directory.
Ref :Youtube Video Download (Android/Java)
Edit 3
You can use the Lib : https://github.com/HaarigerHarald/android-youtubeExtractor
Ex :
String youtubeLink = "http://youtube.com/watch?v=xxxx";
new YouTubeExtractor(this) {
#Override
public void onExtractionComplete(SparseArray<YtFile> ytFiles, VideoMeta vMeta) {
if (ytFiles != null) {
int itag = 22;
String downloadUrl = ytFiles.get(itag).getUrl();
}
}
}.extract(youtubeLink, true, true);
They decipherSignature using :
private boolean decipherSignature(final SparseArray<String> encSignatures) throws IOException {
// Assume the functions don't change that much
if (decipherFunctionName == null || decipherFunctions == null) {
String decipherFunctUrl = "https://s.ytimg.com/yts/jsbin/" + decipherJsFileName;
BufferedReader reader = null;
String javascriptFile;
URL url = new URL(decipherFunctUrl);
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestProperty("User-Agent", USER_AGENT);
try {
reader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
StringBuilder sb = new StringBuilder("");
String line;
while ((line = reader.readLine()) != null) {
sb.append(line);
sb.append(" ");
}
javascriptFile = sb.toString();
} finally {
if (reader != null)
reader.close();
urlConnection.disconnect();
}
if (LOGGING)
Log.d(LOG_TAG, "Decipher FunctURL: " + decipherFunctUrl);
Matcher mat = patSignatureDecFunction.matcher(javascriptFile);
if (mat.find()) {
decipherFunctionName = mat.group(1);
if (LOGGING)
Log.d(LOG_TAG, "Decipher Functname: " + decipherFunctionName);
Pattern patMainVariable = Pattern.compile("(var |\\s|,|;)" + decipherFunctionName.replace("$", "\\$") +
"(=function\\((.{1,3})\\)\\{)");
String mainDecipherFunct;
mat = patMainVariable.matcher(javascriptFile);
if (mat.find()) {
mainDecipherFunct = "var " + decipherFunctionName + mat.group(2);
} else {
Pattern patMainFunction = Pattern.compile("function " + decipherFunctionName.replace("$", "\\$") +
"(\\((.{1,3})\\)\\{)");
mat = patMainFunction.matcher(javascriptFile);
if (!mat.find())
return false;
mainDecipherFunct = "function " + decipherFunctionName + mat.group(2);
}
int startIndex = mat.end();
for (int braces = 1, i = startIndex; i < javascriptFile.length(); i++) {
if (braces == 0 && startIndex + 5 < i) {
mainDecipherFunct += javascriptFile.substring(startIndex, i) + ";";
break;
}
if (javascriptFile.charAt(i) == '{')
braces++;
else if (javascriptFile.charAt(i) == '}')
braces--;
}
decipherFunctions = mainDecipherFunct;
// Search the main function for extra functions and variables
// needed for deciphering
// Search for variables
mat = patVariableFunction.matcher(mainDecipherFunct);
while (mat.find()) {
String variableDef = "var " + mat.group(2) + "={";
if (decipherFunctions.contains(variableDef)) {
continue;
}
startIndex = javascriptFile.indexOf(variableDef) + variableDef.length();
for (int braces = 1, i = startIndex; i < javascriptFile.length(); i++) {
if (braces == 0) {
decipherFunctions += variableDef + javascriptFile.substring(startIndex, i) + ";";
break;
}
if (javascriptFile.charAt(i) == '{')
braces++;
else if (javascriptFile.charAt(i) == '}')
braces--;
}
}
// Search for functions
mat = patFunction.matcher(mainDecipherFunct);
while (mat.find()) {
String functionDef = "function " + mat.group(2) + "(";
if (decipherFunctions.contains(functionDef)) {
continue;
}
startIndex = javascriptFile.indexOf(functionDef) + functionDef.length();
for (int braces = 0, i = startIndex; i < javascriptFile.length(); i++) {
if (braces == 0 && startIndex + 5 < i) {
decipherFunctions += functionDef + javascriptFile.substring(startIndex, i) + ";";
break;
}
if (javascriptFile.charAt(i) == '{')
braces++;
else if (javascriptFile.charAt(i) == '}')
braces--;
}
}
if (LOGGING)
Log.d(LOG_TAG, "Decipher Function: " + decipherFunctions);
decipherViaWebView(encSignatures);
if (CACHING) {
writeDeciperFunctToChache();
}
} else {
return false;
}
} else {
decipherViaWebView(encSignatures);
}
return true;
}
Now with use of this library High Quality Videos Lossing Audio so i use the MediaMuxer for Murging Audio and Video for Final Output
Edit 1
https://stackoverflow.com/a/15240012/9909365
Why the previous answer not worked
Pattern p2 = Pattern.compile("sig=(.*?)[&]");
Matcher m2 = p2.matcher(url);
String sig = null;
if (m2.find()) {
sig = m2.group(1);
}
As of November 2016, this is a little rough around the edges, but
displays the basic principle. The url_encoded_fmt_stream_map today
does not have a space after the colon (better make this optional) and
"sig" has been changed to "signature"
and while i am debuging the code i found the new keyword its
signature&s in many video's URL
here edited answer
private static final HashMap<String, Meta> typeMap = new HashMap<String, Meta>();
initTypeMap(); call first
class Meta {
public String num;
public String type;
public String ext;
Meta(String num, String ext, String type) {
this.num = num;
this.ext = ext;
this.type = type;
}
}
class Video {
public String ext = "";
public String type = "";
public String url = "";
Video(String ext, String type, String url) {
this.ext = ext;
this.type = type;
this.url = url;
}
}
public ArrayList<Video> getStreamingUrisFromYouTubePage(String ytUrl)
throws IOException {
if (ytUrl == null) {
return null;
}
// Remove any query params in query string after the watch?v=<vid> in
// e.g.
// http://www.youtube.com/watch?v=0RUPACpf8Vs&feature=youtube_gdata_player
int andIdx = ytUrl.indexOf('&');
if (andIdx >= 0) {
ytUrl = ytUrl.substring(0, andIdx);
}
// Get the HTML response
/* String userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0.1)";*/
/* HttpClient client = new DefaultHttpClient();
client.getParams().setParameter(CoreProtocolPNames.USER_AGENT,
userAgent);
HttpGet request = new HttpGet(ytUrl);
HttpResponse response = client.execute(request);*/
String html = "";
HttpsURLConnection c = (HttpsURLConnection) new URL(ytUrl).openConnection();
c.setRequestMethod("GET");
c.setDoOutput(true);
c.connect();
InputStream in = c.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
str.append(line.replace("\\u0026", "&"));
}
in.close();
html = str.toString();
// Parse the HTML response and extract the streaming URIs
if (html.contains("verify-age-thumb")) {
Log.e("Downloader", "YouTube is asking for age verification. We can't handle that sorry.");
return null;
}
if (html.contains("das_captcha")) {
Log.e("Downloader", "Captcha found, please try with different IP address.");
return null;
}
Pattern p = Pattern.compile("stream_map\":\"(.*?)?\"");
// Pattern p = Pattern.compile("/stream_map=(.[^&]*?)\"/");
Matcher m = p.matcher(html);
List<String> matches = new ArrayList<String>();
while (m.find()) {
matches.add(m.group());
}
if (matches.size() != 1) {
Log.e("Downloader", "Found zero or too many stream maps.");
return null;
}
String urls[] = matches.get(0).split(",");
HashMap<String, String> foundArray = new HashMap<String, String>();
for (String ppUrl : urls) {
String url = URLDecoder.decode(ppUrl, "UTF-8");
Log.e("URL","URL : "+url);
Pattern p1 = Pattern.compile("itag=([0-9]+?)[&]");
Matcher m1 = p1.matcher(url);
String itag = null;
if (m1.find()) {
itag = m1.group(1);
}
Pattern p2 = Pattern.compile("signature=(.*?)[&]");
Matcher m2 = p2.matcher(url);
String sig = null;
if (m2.find()) {
sig = m2.group(1);
} else {
Pattern p23 = Pattern.compile("signature&s=(.*?)[&]");
Matcher m23 = p23.matcher(url);
if (m23.find()) {
sig = m23.group(1);
}
}
Pattern p3 = Pattern.compile("url=(.*?)[&]");
Matcher m3 = p3.matcher(ppUrl);
String um = null;
if (m3.find()) {
um = m3.group(1);
}
if (itag != null && sig != null && um != null) {
Log.e("foundArray","Adding Value");
foundArray.put(itag, URLDecoder.decode(um, "UTF-8") + "&"
+ "signature=" + sig);
}
}
Log.e("foundArray","Size : "+foundArray.size());
if (foundArray.size() == 0) {
Log.e("Downloader", "Couldn't find any URLs and corresponding signatures");
return null;
}
ArrayList<Video> videos = new ArrayList<Video>();
for (String format : typeMap.keySet()) {
Meta meta = typeMap.get(format);
if (foundArray.containsKey(format)) {
Video newVideo = new Video(meta.ext, meta.type,
foundArray.get(format));
videos.add(newVideo);
Log.d("Downloader", "YouTube Video streaming details: ext:" + newVideo.ext
+ ", type:" + newVideo.type + ", url:" + newVideo.url);
}
}
return videos;
}
private class YouTubePageStreamUriGetter extends AsyncTask<String, String, ArrayList<Video>> {
ProgressDialog progressDialog;
#Override
protected void onPreExecute() {
super.onPreExecute();
progressDialog = ProgressDialog.show(webViewActivity.this, "",
"Connecting to YouTube...", true);
}
#Override
protected ArrayList<Video> doInBackground(String... params) {
ArrayList<Video> fVideos = new ArrayList<>();
String url = params[0];
try {
ArrayList<Video> videos = getStreamingUrisFromYouTubePage(url);
/* Log.e("Downloader","Size of Video : "+videos.size());*/
if (videos != null && !videos.isEmpty()) {
for (Video video : videos)
{
Log.e("Downloader", "ext : " + video.ext);
if (video.ext.toLowerCase().contains("mp4") || video.ext.toLowerCase().contains("3gp") || video.ext.toLowerCase().contains("flv") || video.ext.toLowerCase().contains("webm")) {
ext = video.ext.toLowerCase();
fVideos.add(new Video(video.ext,video.type,video.url));
}
}
return fVideos;
}
} catch (Exception e) {
e.printStackTrace();
Log.e("Downloader", "Couldn't get YouTube streaming URL", e);
}
Log.e("Downloader", "Couldn't get stream URI for " + url);
return null;
}
#Override
protected void onPostExecute(ArrayList<Video> streamingUrl) {
super.onPostExecute(streamingUrl);
progressDialog.dismiss();
if (streamingUrl != null) {
if (!streamingUrl.isEmpty()) {
//Log.e("Steaming Url", "Value : " + streamingUrl);
for (int i = 0; i < streamingUrl.size(); i++) {
Video fX = streamingUrl.get(i);
Log.e("Founded Video", "URL : " + fX.url);
Log.e("Founded Video", "TYPE : " + fX.type);
Log.e("Founded Video", "EXT : " + fX.ext);
}
//new ProgressBack().execute(new String[]{streamingUrl, filename + "." + ext});
}
}
}
}
public void initTypeMap()
{
typeMap.put("13", new Meta("13", "3GP", "Low Quality - 176x144"));
typeMap.put("17", new Meta("17", "3GP", "Medium Quality - 176x144"));
typeMap.put("36", new Meta("36", "3GP", "High Quality - 320x240"));
typeMap.put("5", new Meta("5", "FLV", "Low Quality - 400x226"));
typeMap.put("6", new Meta("6", "FLV", "Medium Quality - 640x360"));
typeMap.put("34", new Meta("34", "FLV", "Medium Quality - 640x360"));
typeMap.put("35", new Meta("35", "FLV", "High Quality - 854x480"));
typeMap.put("43", new Meta("43", "WEBM", "Low Quality - 640x360"));
typeMap.put("44", new Meta("44", "WEBM", "Medium Quality - 854x480"));
typeMap.put("45", new Meta("45", "WEBM", "High Quality - 1280x720"));
typeMap.put("18", new Meta("18", "MP4", "Medium Quality - 480x360"));
typeMap.put("22", new Meta("22", "MP4", "High Quality - 1280x720"));
typeMap.put("37", new Meta("37", "MP4", "High Quality - 1920x1080"));
typeMap.put("33", new Meta("38", "MP4", "High Quality - 4096x230"));
}
Edit 2:
Some time This Code Not worked proper
Same-origin policy
https://en.wikipedia.org/wiki/Same-origin_policy
https://en.wikipedia.org/wiki/Cross-origin_resource_sharing
problem of Same-origin policy. Essentially, you cannot download this file from www.youtube.com because they are different domains. A workaround of this problem is [CORS][1].
Ref : https://superuser.com/questions/773719/how-do-all-of-these-save-video-from-youtube-services-work/773998#773998
url_encoded_fmt_stream_map // traditional: contains video and audio stream
adaptive_fmts // DASH: contains video or audio stream
Each of these is a comma separated array of what I would call "stream objects". Each "stream object" will contain values like this
url // direct HTTP link to a video
itag // code specifying the quality
s // signature, security measure to counter downloading
Each URL will be encoded so you will need to decode them. Now the tricky part.
YouTube has at least 3 security levels for their videos
unsecured // as expected, you can download these with just the unencoded URL
s // see below
RTMPE // uses "rtmpe://" protocol, no known method for these
The RTMPE videos are typically used on official full length movies, and are protected with SWF Verification Type 2. This has been around since 2011 and has yet to be reverse engineered.
The type "s" videos are the most difficult that can actually be downloaded. You will typcially see these on VEVO videos and the like. They start with a signature such as
AA5D05FA7771AD4868BA4C977C3DEAAC620DE020E.0F421820F42978A1F8EAFCDAC4EF507DB5
Then the signature is scrambled with a function like this
function mo(a) {
a = a.split("");
a = lo.rw(a, 1);
a = lo.rw(a, 32);
a = lo.IC(a, 1);
a = lo.wS(a, 77);
a = lo.IC(a, 3);
a = lo.wS(a, 77);
a = lo.IC(a, 3);
a = lo.wS(a, 44);
return a.join("")
}
This function is dynamic, it typically changes every day. To make it more difficult the function is hosted at a URL such as
http://s.ytimg.com/yts/jsbin/html5player-en_US-vflycBCEX.js
this introduces the problem of Same-origin policy. Essentially, you cannot download this file from www.youtube.com because they are different domains. A workaround of this problem is CORS. With CORS, s.ytimg.com could add this header
Access-Control-Allow-Origin: http://www.youtube.com
and it would allow the JavaScript to download from www.youtube.com. Of course they do not do this. A workaround for this workaround is to use a CORS proxy. This is a proxy that responds with the following header to all requests
Access-Control-Allow-Origin: *
So, now that you have proxied your JS file, and used the function to scramble the signature, you can use that in the querystring to download a video.
ytd2 is a fully functional YouTube video downloader. Check out its source code if you want to see how it's done.
Alternatively, you can also call an external process like youtube-dl to do the job. This is probably the easiest solution but it isn't in "pure" Java.

Categories

Resources