29 Apr 2004 09:31
Re: reading a node graph
On 29 Apr 2004, at 03:26, Dion Almaer wrote:
> To get this I would do:
>
> --------------------------------------------------------
> builder = NodeBuilder.newInstance();
>
> html = builder.html {
> body {
> h1("Groovy Baby!")
> }
> }
>
> println(html.body.h1[0].value())
> --------------------------------------------------------
>
> But it would be nice to have something cleaner,
I'm not sure how much neater we can get, though I'm open to new ideas.
A bit of background...
html.body.h1 is walking the node tree and so (talking generically) in
XML there could be 0..N body elements inside html and 0..N h1 elements
inside each body.
So html.body.h1 returns a collection of nodes. You can nest the
navigations as deep as you like, you always get 1 flattened list. Hence
to get just the first node its
html.body.h1[0]
This is the same irrespective of how many 1-many relationships you walk.
To get a list of all h1 nodes its
html.body.h1
If we then want to filter the collection we can do
html.body.h1.findAll { it.text().contains("cheese") }
getting back to the expression in question, to get the value of the
first node its
html.body.h1[0].value()
or for the text
html.body.h1[0].text()
(elements can contain text + nodes, so if you just want the text then
use the text() method).
In terms of verbosity, it all depends on what you want to do. e.g. its
normal for XPath to select nodes, then it can be up to you to convert
them to strings. e.g. if we added some XPath helper methods to Node
it'd look like...
html.xpathAsString("body/h1")
or
html.xpath("string(body/h1)")
which are probably more clumsy & less clean.
The issues with navigating trees of things like Nodes are
* there are 0..N relationships all over the place (other than for
attributes of a node where its 0..1)
* a node has a name(), attributes(), a value() & text() and sometimes
you need the node / value / text / an attribute. This makes the
'.text()" postfix in the above expression a requirement in my eyes.
Imagine this other example...
html = builder.html {
body {
h1("Groovy Baby!")
h1("Another heading")
}
}
The expression
html.body.h1
now returns a list of 2 nodes. This might be useful as we might wanna
iterate over them, filter them etc.
In the first example we could have optimised away the [0] as the body
node knows there's only 1 header, so we could make html.body.h1 return
the first node and not require the [0]. However this would mean that
this expression will only now work for documents with a single h1
element. i.e. in the presence of a document with 2 h1 elements the
expression html.body.h1 would change from returning a node to returning
a list of 2 nodes.
So all in all, I think the [0] and the .text() postfixes are a
requirement of the default GPath navigation around nodes - as in 1-N
graphs its important to have expression polymorphism - i.e. that the
expression...
html.body.h1[0]
will return the first header, irrespective of how many body & h1
elements there may be (or it'll return null if there's no h1).
Though there's nothing to stop you having a special facade which
operates differently. Another idea we could have is to add a method to
the list class we use to concatenate text. e.g. if we add a List.text()
method to the list implementation we use inside the Node
implementation, which iterates through the contents concatenating the
Strings or Node.text() values into 1 string, then we could do things
like
html.body.h1.text()
which would avoid the [0]. However if there were 2 h1 elements we'd end
up with both headers concatenated into 1 string, which is probably not
what people want. i.e. we shouldn't write brittle code that as soon as
the markup changes a little things break or unexpected things occur.
So I think html.body.h1[0].text() on balance is the best choice of
general purpose navigation. Though like anything YMMV and there could
always be different APIs / helper methods to navigate things in a
different way.
James
-------
http://radio.weblogs.com/0112098/
RSS Feed