Wednesday, November 21, 2007

Scan web page by url with HttpWebRequest method

Scan the web page with HttpWebRequest

After deploying code and html pages to palladium group we found so many broken links in pages, which has become a big problem to rework...SO, .. i create a console application which will run in back end and get the content of of all url and check the url whether its working or not by getting the response ,.. and if it is not working send the mail to content developers .

This is the method to scan url

private string GetContent(string url )
{
try
{
// Place the web request to the server by specifying the URL
wReq = (HttpWebRequest)WebRequest.Create(url);
// No need for a persistant connection
wReq.KeepAlive = false;
wResp = (HttpWebResponse)wReq.GetResponse();
// Display the header of the response
// wResp.Headers.ToString();
// Get a stream to read the body of the response
rStream = wResp.GetResponseStream();
// Needed for looping through the buffer int bufCount = 0;
// Buffer in which we're going to store data coming from the server
byte[] byteBuf = new byte[1024];
// Loop as long as there's data in the buffer
string myOutPut = string.Empty ;
do
{
// Fill the buffer with data
bufCount = rStream.Read(byteBuf, 0, byteBuf.Length);
if (bufCount != 0)
{
// Transform the bytes into ASCII text and append to the textbox
myOutPut += System.Text.Encoding.ASCII.GetString(byteBuf, 0, bufCount);
}
}
while (bufCount > 0);
if(myOutPut.IndexOf("corp.thepalladiumgroup") != -1)
{
return "This Content contains corp.thepalladiumgroup link ";
}
}
catch (Exception ex)
{
if (ex.Message.IndexOf("404") != -1 )
{
return " Error 404 found : Details - " + ex.Message ;
}
else if ( ex.Message.IndexOf("500") != -1)
{
return " Error 500 found : Detais - " + ex.Message ;
}
else
{
return "General Exception : " + ex.Message ;
}
}
return "" ;
}

In actual code I am storing all the url in config ,.. get the items in config ,..loop with all url items and check resoponse , if any error send mail ...