c# - getting text-nodes of HtmlDocument -

- March 15, 2014

after webbrowser document loads, document contains like:

<div id="toextract">     <div>this</div>     <div>is</div>     sample     <div>text</div>         <div>want to</div>     <div>extract</div> </div>

i want extract innerhtml of these elements output be:

this sample text want extract

but this:

this text want extract

as word i , sample not in htmlelement. code:

string ex = ""; htmlelement elem = webbrowser1.document.getelementbyid("toextract"); htmlelementcollection elems = elem.all for(int i=0;i<elems.count;i++)     ex += elems[i].innerhtml + " ";

my code skips text-nodes (nodes no tag). think because not considered htmlelement. how can include them in extracted text?

simply fetch text

elem.innertext

and remove linefeeds this

elem.innertext.replace(system.environment.newline, " ")

Search This Blog

Sp

c# - getting text-nodes of HtmlDocument -

Comments

Post a Comment

Popular posts from this blog

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -

c++11 - Intel compiler and "cannot have an in-class initializer" when using constexpr -

symfony - imagine_filter() not generating the correct url in LiipImagineBundle -