jastrachan | 30 Apr 09:15 2004
Picon

Re: xml document to node (object)

On 29 Apr 2004, at 23:01, Chris Poirier wrote:
> Hi Richard,
>
>> Interesting ideas. Its hard to settle on one of them.  As long as we
>> are talking about GPath (I think) why is it you do this:
>>
>> page = builder.html { Body { h1("Bla")}}
>> println( page.html.Body.h1[0].text()
>>
>> WHY THE [] at node 'h1' why not at 'html' and 'Body'.

You could experiment with expressions in a shell and see what they 
return :)

page.Body returns a list of nodes.
page.Body.h1 returns a list of nodes.

In both cases its a list of 1 node for the above document. The [0] 
pulls out the first member of the node. i.e. returning a Node rather 
than a List of one Node

>> I don't get it
>> (maybe because I haven't studied XPath)  The following seems more
>> intuative to me:
>
> I think each step is producing a set.  You /could/ put the [0] at any
> level, but all you'd be doing is constricting the set produced by the
> next offset.  In your case, the result set at the end will be the same
> with or without it.  At the end, you need to get a single node from the
> set, because text() isn't defined on List (I think).

Thats exactly right. This is a particular feature of the Node class. If 
you were navigating a bean then its up to the bean to decide if each 
property is 1 object or a collection of objects. So for a bean, 
page.Body.h1 could be 1 object if you just navigate 1-1 relationships.

e.g.

class Person {
     property Order order
}

class Order {
     property Integer amount
}

a = person.order.amount
assert a instanceof Integer

i.e. there's no collections generated by that navigation path.

However the Node class represents an arbitrary graph of nodes and so 
each property access returns a list of nodes, irrespective of how many 
children a node has - to preserve XML / XPath-ish semantics. The side 
effect of this is that GPath behaves like XPath when navigating Nodes. 
In XPath

     page/Body/h1

is a node-set. If you want a single node in XPath you'd do

     page/Body/h1[0]

So we've made the Node class work similarly, by the Node class just 
deciding that each property is a list of nodes.

The variable page is a node (as an XML document can only have 1 root 
element) but after page, as Chris said, in Groovy you can use [0] at 
any point (indeed we can use [2] or [1..4] or any subscript operation.

page.Body[0].h1[0]

The [0] could be handy if there were multiple Body elements and you 
wanted to choose one to filter out only the h1 elements of a certain 
Body.

However if you don't mind about that, then

page.Body.h1

will return a list of all of the h1 elements of all of the Body 
elements. i.e. asking a List for a property, since lists are not beans 
& don't typically have any properties, will return a list of values of 
the properties of its elements.

e.g. going back to the bean example above. If we had a list of people...

people = [person1, person2, person3]
amounts = people.order.amount
assert amounts instanceof List
assert amounts.size() == 3
assert amounts[0] instanceof Integer

etc

So maybe the 'magic' thats not obvious is how Lists will navigate their 
content when property notation is used. i.e. to be able to navigate 
through 1-N relationships, you don't have to explicitly pull out one 
item in the list with the [i] notation first, you can just navigate 
straight through the list.

> The assumption is you are access the contents of h1, what every it may 
> be. If you need to access an attribute than you would do this:
>
> page.html.Body.h1.myAtt
>
> Doesn't this seem easier, or am I missing something?

When you say attribute here are you talking about XML attributes?

In XML, we have elements & attributes. The elements are the <tag> 
things and the attributes are they key-value pairs inside the <tag>.

<element attr="foo">....</element>

Now for a given name foo, an element could have 0..1 attributes and 
0..N elements. e.g.

<foo x="123"> <x>1</x> <x>2</x> </foo>

if I then in GPath did

x = foo.x

what should it return?

Today it returns a list of 2 nodes as the property access, like XPath, 
is reserved for element traversal only. To access attributes we use a 
different mechanism.

xAttr = foo.attribute("x")

or

xAttr = foo[" <at> x"]

in XPath we'd do

     foo/x

for element access and

     foo/ <at> x

for attribute access.

In Groovy, bean.propertyName is the notation for accessing bean 
properties (getter/setters). At some point I'd like to add support for 
explicit field access (i.e. ignore the getter/setter if I have access 
to do so). I'm thinking of using the Ruby style syntax for this

bean.@...

http://jira.codehaus.org/secure/ViewIssue.jspa?key=GROOVY-17

If we did this, the nice benefit would be we'd then have an XPath-ish 
syntax for extracting attribute values in GPath expressions...

xAttr = foo.@...
xElements = foo.x

Of course there's nothing to stop someone writing a Node-like class and 
deciding that they don't want an XML-like tree model where each node 
consists of a Map of attributes and a separate List of child nodes/text 
and instead just flattened the attributes and elements together so that 
each node has map of names to lists of values such that

foo.x

would return the attribute value, if there were no <x> child elements 
or would return child elements called <x> if there was no x attribute 
or a combination of the attribute value and any child nodes if there 
are both (which admittedly is rare in XML).

However in XML navigation its usually considered that attributes are 
very different to elements and so all XML APIs and XPath itself 
explicitly differentiates between attributes and elements - that a user 
must explicitly specify when they want an attribute value or an element 
- and so the Node class follows this policy, that attributes and 
elements have different access mechanisms (in XPath speak we'd probably 
say they are different navigation axes).

James
-------
http://radio.weblogs.com/0112098/

Gmane