jastrachan | 29 Apr 09:31 2004

Re: reading a node graph

On 29 Apr 2004, at 03:26, Dion Almaer wrote:
> To get this I would do:
> --------------------------------------------------------
> builder = NodeBuilder.newInstance();
> html = builder.html {
>  body {
>   h1("Groovy Baby!")
>  }
> }
> println(html.body.h1[0].value())
> --------------------------------------------------------
> But it would be nice to have something cleaner,

I'm not sure how much neater we can get, though I'm open to new ideas. 
A bit of background...

html.body.h1 is walking the node tree and so (talking generically) in 
XML there could be 0..N body elements inside html and 0..N h1 elements 
inside each body.

So html.body.h1 returns a collection of nodes. You can nest the 
navigations as deep as you like, you always get 1 flattened list. Hence 
to get just the first node its


This is the same irrespective of how many 1-many relationships you walk.

To get a list of all h1 nodes its


If we then want to filter the collection we can do

     html.body.h1.findAll { it.text().contains("cheese") }

getting back to the expression in question, to get the value of the 
first node its


or for the text


(elements can contain text + nodes, so if you just want the text then 
use the text() method).

In terms of verbosity, it all depends on what you want to do. e.g. its 
normal for XPath to select nodes, then it can be up to you to convert 
them to strings. e.g. if we added some XPath helper methods to Node 
it'd look like...




which are probably more clumsy & less clean.

The issues with navigating trees of things like Nodes are

* there are 0..N relationships all over the place (other than for 
attributes of a node where its 0..1)

* a node has a name(), attributes(), a value() & text() and sometimes 
you need the node / value / text / an attribute. This makes the 
'.text()" postfix in the above expression a requirement in my eyes.

Imagine this other example...

html = builder.html {
 body {
  h1("Groovy Baby!")
   h1("Another heading")

The expression


now returns a list of 2 nodes. This might be useful as we might wanna 
iterate over them, filter them etc.

In the first example we could have optimised away the [0] as the body 
node knows there's only 1 header, so we could make html.body.h1 return 
the first node and not require the [0]. However this would mean that 
this expression will only now work for documents with a single h1 
element. i.e. in the presence of a document with 2 h1 elements the 
expression html.body.h1 would change from returning a node to returning 
a list of 2 nodes.

So all in all, I think the [0] and the .text() postfixes are a 
requirement of the default GPath navigation around nodes - as in 1-N 
graphs its important to have expression polymorphism - i.e. that the 


will return the first header, irrespective of how many body & h1 
elements there may be (or it'll return null if there's no h1).

Though there's nothing to stop you having a special facade which 
operates differently. Another idea we could have is to add a method to 
the list class we use to concatenate text. e.g. if we add a List.text() 
method to the list implementation we use inside the Node 
implementation, which iterates through the contents concatenating the 
Strings or Node.text() values into 1 string, then we could do things 


which would avoid the [0]. However if there were 2 h1 elements we'd end 
up with both headers concatenated into 1 string, which is probably not 
what people want. i.e. we shouldn't write brittle code that as soon as 
the markup changes a little things break or unexpected things occur.

So I think html.body.h1[0].text() on balance is the best choice of 
general purpose navigation. Though like anything YMMV and there could 
always be different APIs / helper methods to navigate things in a 
different way.