c# - getting text-nodes of HtmlDocument -
after webbrowser document loads, document contains like:
<div id="toextract"> <div>this</div> <div>is</div> sample <div>text</div> <div>want to</div> <div>extract</div> </div>
i want extract innerhtml of these elements output be:
this sample text want extract
but this:
this text want extract
as word i , sample not in htmlelement. code:
string ex = ""; htmlelement elem = webbrowser1.document.getelementbyid("toextract"); htmlelementcollection elems = elem.all for(int i=0;i<elems.count;i++) ex += elems[i].innerhtml + " ";
my code skips text-nodes (nodes no tag). think because not considered htmlelement. how can include them in extracted text?
simply fetch text
elem.innertext
and remove linefeeds this
elem.innertext.replace(system.environment.newline, " ")
Comments
Post a Comment