Nobody likes XML, except masochists and Microsoft consultants who charge by the hour.
Sometimes you are forced to deal with XML anyway, such as parsing the response from external APIs.
Parsing XML is per se a nasty business to begin with, but in this particular example, the ugliness set new records.
The only “solution” I came up with is too ugly for production use, but the alternatives were even uglier, so I had no option.
I hope there is someone out there reading this who can present a proper solution to the problem.
Let’s say you have this XML:
<testxml xmlns:ns1="http://www.example.com" xmlns:ns2="http://www.example2.com"> <ns1:foo> <ns2:bar>baz</ns2:bar> </ns1:foo> </testxml>
Using xpath()
you extract the content of testxml
:
SELECT xpath( '/testxml/ns1:foo', '<testxml xmlns:ns1="http://www.example.com" xmlns:ns2="http://www.example2.com"><ns1:foo><ns2:bar>baz</ns2:bar></ns1:foo></testxml>'::xml, ARRAY[ ['ns1','http://www.example.com'], ['ns2','http://www.example2.com'] ] ); -- Result: <ns1:foo><ns2:bar>baz</ns2:bar></ns1:foo>
The returned XML is not valid since its missing the xmlns definitions,
but the PostgreSQL XML data type doesn’t complain, which is OK I guess,
a bit of quirks mode perhaps?
Because of the missing xmlns, it’s impossible to make use of this XML fragment returned.
You cannot extract any subsequent sub-elements in it using XPath.
For instance, this won’t work:
SELECT xpath( '/ns1:foo/ns2:bar/text()', (xpath( '/testxml/ns1:foo', '<testxml xmlns:ns1="http://www.example.com" xmlns:ns2="http://www.example2.com"><ns1:foo><ns2:bar>baz</ns2:bar></ns1:foo></testxml>'::xml, ARRAY[ ['ns1','http://www.example.com'], ['ns2','http://www.example2.com'] ] ))[1] ); -- Error: -- ERROR: could not create XPath object -- DETAIL: namespace error : Namespace prefix ns1 on foo is not defined -- <ns1:foo> -- ^ -- namespace error : Namespace prefix ns2 on bar is not defined -- <ns2:bar>baz</ns2:bar> -- ^ -- Undefined namespace prefix -- xmlXPathCompiledEval: evaluation failed
Even if you pass it the NSArray in the outer xpath() call, you don’t get ‘baz’ but nothing at all.
SELECT xpath( '/ns1:foo/ns2:bar/text()', (xpath( '/testxml/ns1:foo', '<testxml xmlns:ns1="http://www.example.com" xmlns:ns2="http://www.example2.com"><ns1:foo><ns2:bar>baz</ns2:bar></ns1:foo></testxml>'::xml, ARRAY[ ['ns1','http://www.example.com'], ['ns2','http://www.example2.com'] ] ))[1], ARRAY[ ['ns1','http://www.example.com'], ['ns2','http://www.example2.com'] ] ); -- Returns: -- xpath -- ------- -- {} -- (1 row) -- And NOT 'baz' which is what we want.
Therefore, given you have an XML “fragment” where the tags have ns:___ but without the xmlns:___=”URI” part,
is there any way to extract sub-elements, given you know the namespaces?
For instance, by wrapping the XML “fragment” inside an outer XML tag,
and specifying bogous xmlns elements for the namespaces,
I managed to hack together a work-around, probably with hundreds of flaws,
and this cannot possibly be the best way to approach this problem.
SELECT xpath_fragment('/ns1:foo/ns2:bar/text()','<ns1:foo><ns2:bar>baz</ns2:bar></ns1:foo>',ARRAY['ns1','ns2']); xpath_fragment ---------------- {baz} (1 row)
Source code of insanely ugly “solution” to the “problem”:
CREATE OR REPLACE FUNCTION xpath_fragment(_XPath text, _XML xml, _NSNames text[]) RETURNS XML[] AS $BODY$ DECLARE _ text; _WrappedXML xml; _NSArray text[][] := ARRAY[]::text[][]; BEGIN SELECT ('<xml ' || array_to_string(array_agg('xmlns:' || unnest || '=" "'),' ') || '>' || _XML::text || '</xml>')::text INTO _WrappedXML FROM unnest(_NSNames); FOR _ IN SELECT unnest(_NSNames) LOOP _NSArray := _NSArray || ARRAY[[_, ' ']]; END LOOP; RETURN xpath('/xml' || _XPath, _WrappedXML, _NSArray); END; $BODY$ LANGUAGE plpgsql IMMUTABLE;
Ideas, anyone?
If you’re guaranteed not to have conflicting names in different namespaces, this works:
SELECT xpath(
‘/foo/bar/text()’,
(xpath(
‘/testxml/ns1:foo’,
‘baz’::xml,
ARRAY[
[‘ns1′,’http://www.example.com’],
[‘ns2′,’http://www.example2.com’]
]
))[1]
);
xpath
——-
{baz}
(1 row)
However, if there are several, you’re screwed unless you want all of them or only a number of them but don’t care which ones:
select xpath(‘/foo/bar/text()’, xml ‘ns1barns2bar’);
xpath
—————–
{ns1bar,ns2bar}
(1 row)
If someone can come up with something better, I’m interested too. 🙂
I also find this super-annoying, especially since I hack on an application that does a lot of munging of XML stored in a PostgreSQL database. I ended up running with a PL/Perl stored procedure that wraps XML::LibXML and makes sure to return declarations for all namespaces defined in the original XML. Here it is (credit to Mike Rylander and myself):
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
perl_xpath.sql
hosted with ❤ by GitHub
Using perl_xpath with your example seems to have the desired effect (though note that the XPath expression should probably be ‘/ns1:testxml/ns1:foo’, not ‘/testxml/ns1:foo’.
Sometimes you are forced to deal with XML anyway, such as parsing the response from external APIs.
Parsing XML is per se a nasty business to begin with, but in this particular example, the ugliness set new records. The only “solution” I came up with is too ugly for production use, but the alternatives were even uglier, so I had no web design.
The only “solution” I came up with is too ugly for production use, but the alternatives were even uglier, so I had no option.it had hammered the initial proposals with an advertising campaign that may have been the final nail in the coffin of Kevin Rudd. singapore accounting software
Nobody likes XML, except masochists and Microsoft consultants who charge by the hour.Nevertheless HCG drops
I hope there is someone out there reading this who can present a proper solution to the problem.While Oxford and Cambridge per se may be fair targets eye doctor portland
I hope there is someone out there reading this who can present a proper solution to the problem.Howeverforskolin for weight loss
I hope there is someone out there reading this who can present a proper solution to the problem.For examplespartagen XT
I hope there is someone out there reading this who can present a proper solution to the problem.The Greens have generally applauded Moylan’s actions. Hay Day Cheats Tool APK – Diamonds
I hope there is someone out there reading this who can present a proper solution to the problem.He could face jail time. He has said that his intention was not to cause people to lose moneyhow to lose weight
Have you tried like this?
SELECT xpath(
‘/testxml/*[local-name() = “foo”]/*[local-name() = “bar”]/text()’,
‘baz’::xml
);