c# - getting text-nodes of HtmlDocument -


after webbrowser document loads, document contains like:

<div id="toextract">     <div>this</div>     <div>is</div>     sample     <div>text</div>         <div>want to</div>     <div>extract</div> </div> 

i want extract innerhtml of these elements output be:

this sample text want extract

but this:

this text want extract

as word i , sample not in htmlelement. code:

string ex = ""; htmlelement elem = webbrowser1.document.getelementbyid("toextract"); htmlelementcollection elems = elem.all for(int i=0;i<elems.count;i++)     ex += elems[i].innerhtml + " "; 

my code skips text-nodes (nodes no tag). think because not considered htmlelement. how can include them in extracted text?

simply fetch text

elem.innertext

and remove linefeeds this

elem.innertext.replace(system.environment.newline, " ")


Comments

Popular posts from this blog

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -

php - Magento - Deleted Base url key -

android - How to disable Button if EditText is empty ? -