Getting HTML From a Webpage


So lets say you are using a SharePoint / Commerce Server website, and you’re authenticated through a cookie token. Lets say you have a page setup whose output you need, possibly for generating e-mail, uploading somewhere, whatever. There are several ways to get that HTML:

  1. Create a hidden iFrame, and use a server callback to send the contents back to you.
  2. Use WebClient to go get it.
  3. Roll your own HTTP client.
  4. Something else I didn’t think of off the top of my head.

Lets be honest… (1) is ugly as sin. And if we try (2) implemented as below:

    WebClient wc = new WebClient();
    string html  = wc.DownloadString("http://mysight.com/PageINeed.ascx");

It doesn’t work. Why not? Because WebClient creates an entirely new request from a faux browser. No cookies, no authentication… nothing. I ran across a very quick and dirty way to solve this:

    public class MasqueradeWebClient : WebClient
    {

        public void AddCookie( HttpCookie httpCookie )
        {

            Cookie newCookie = new Cookie();

            newCookie.Domain   = HttpContext.Current.Request.Url.Host;
            newCookie.Expires  = httpCookie.Expires;
            newCookie.Name     = httpCookie.Name;
            newCookie.Path     = httpCookie.Path;
            newCookie.Secure   = httpCookie.Secure;
            newCookie.Value    = httpCookie.Value;

            m_container.Add(newCookie);
        }

        private CookieContainer m_container = new CookieContainer();

        protected override WebRequest GetWebRequest(Uri address)
        {
            WebRequest request = base.GetWebRequest(address);
            if (request is HttpWebRequest)
            {
                (request as HttpWebRequest).CookieContainer = m_container;
            }
            return request;
        }
    }

Using the MasqueradeWebClient, we can now do something like this:

    MasqueradeWebClient mwc = new MasqueradeWebClient();

    for (int i = 0; i < Context.Request.Cookies.Count; i++)
    {
        mwc.AddCookie( Context.Request.Cookies.Get(i) );
    }

    string html  = mwc.DownloadString("http://mysight.com/PageINeed.ascx");

And viola! Our html will download as expected as we are now a cookie-authenticated user!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s