Florian Hars | 20 Sep 2009 17:18
Picon

Re: Load an XML document w/ internal DTD

Normen Müller schrieb:
> Any ideas?

ConstructingParser is non-validating, my first idea was the code at the end of
this mail, but that ran afoul of some broken SYSTEM id resolution logic:

hars <at> st11:~/tmp$ scala parseFromURL
java.io.FileNotFoundException:
/home/hars/tmp/book.xml./dtd/docbook-4.4/docbookx.dtd (No such file or directory)

Then I tried to use this file instead:

<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
  "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"
[
<!ENTITY foreword SYSTEM "foreword.xml" >
]>

<book id="foo">
  &foreword;
</book>

and ended with:

http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:81:1: markupdecl:
unexpected character ']' #93]]>^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:81:2: markupdecl:
unexpected character ']' #93]]> ^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:81:3: markupdecl:
unexpected character '>' #62]]>  ^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:113:30: ']' expected
instead of '"'<!ENTITY euro SDATA "[euro  ]"><!-- euro sign -->
             ^
java.lang.RuntimeException: FATAL

Using the 4.1 DTD that error is gone, but the resulting output is:

<book id="foo"><!-- foreword; --></book>

This bit from the MarkupHander documentation may be relevant:

 Todo
    can we ignore more entity declarations (i.e. those with extIDs)?
    expanding entity references

- Florian

import scala.xml.parsing._
import scala.xml._
import java.io.File
import scala.io.Source

class ConstructingValidatingParser(val input: Source, val preserveWS: Boolean)
extends  ValidatingMarkupHandler
with     ExternalSources
with     MarkupParser {

/* Copy & Paste,
   since ConstructingHandler is an abstract class, not a trait
*/
  def elem(pos: Int, pre: String, label: String, attrs: MetaData,
	   pscope: NamespaceBinding, nodes: NodeSeq): NodeSeq =
	     Elem(pre, label, attrs, pscope, nodes:_*)

  def procInstr(pos: Int, target: String, txt: String) =
    ProcInstr(target, txt)

  def comment(pos: Int, txt: String) =
    Comment(txt)

  def entityRef(pos: Int, n: String) =
    EntityRef(n)

  def text(pos: Int, txt:String) =
    Text(txt)

}

object parseFromURL {
  def main(args:Array[String]): Unit = {
    val src = scala.io.Source.fromURL("file:book.xml");
    val cpa = new ConstructingValidatingParser(src, false);
    cpa.nextch
    val doc = cpa.document();
    // let's see what it is
    val ppr = new scala.xml.PrettyPrinter(80,5);
    val ele = doc.docElem;
    Console.println("finished parsing");
    val out = ppr.format(ele);
    Console.println(out);
  }
}


Gmane