Sunday, June 28, 2009

Regular expression

In some of cases u need to remove html tags from your text.
in this case a custom regular expression will remove that html tags and tag names and gives a formal text.

Remove HTML Tags from HTML string
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
#region

ClearHTMLTags
///
/// ClearHTMLTags
///

/// Html as text (without encoded)
///
/// An integer that if equals to 0 runs only the RegExp filter
// .. 1 runs only the HTML source render filter
// .. 2 runs both the RegExp and the HTML source render
// .. >2 defaults to 0
///
/// Html stripped off text
/// Author: Narendra Tiwari, Date: 06 Feb 2007
///
/// HtmlOperations operations = new HtmlOperations();
/// strFileData = operations.ClearHTMLTags(File.ReadAllText(filePath), 0);
///

public string ClearHTMLTags(string strHTML, int intWorkFlow)
{
Regex regEx = null;
string strTagLess = string.Empty;try
{
strTagLess = strHTML;//1. "remove html tags"

if (intWorkFlow != 1)
{

//this pattern mathces any html tag
regEx = new Regex("<[^>]*>", RegexOptions.IgnoreCase);"");

//all html tags are stripped
}//2. "remove rouge leftovers"// "or, I want to render the source"
// "as html."
//We *might* still have rouge < and >
//let's be positive that those that remain
//are changed into html characters

if (intWorkFlow > 0 && intWorkFlow < 3)
{
regEx = new Regex("[<]", RegexOptions.IgnoreCase);//matches a single <
strTagLess = regEx.Replace(strTagLess, "<");
regEx = new Regex("[>]", RegexOptions.IgnoreCase);//matches a single >
strTagLess = regEx.Replace(strTagLess, ">");
}//3. return the stripped off text
return strTagLess;
}
catch
{
throw;
}
}
#endregion

Example:
Your string: Hello World.

Output:Hello World.

This is usefull especially when you are showing some formatted data. i.e, suppose you have forum the data which is entered in comments section will have editing options like making bold, underline etc.. the posted data should be showed above..
in that case while posting that data the above code remove all html<> tags and it will produce like a plain data.

2 comments: