-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description
The OPS API natively returns XML via HTTP. v0.1.0 introduced parsing capabilities for methods, so that users can work with nicely formatted and easy-to-use Go structs instead if clunky stringified XML. Unfortunately, this seems not to be working properly for the Client.Search method, which delegates XML parsing to epo_ops.ParseSearch.
import (
ops "github.com/patent-dev/epo-ops"
)
const (
key = "…"
secret = "…"
)
func main() {
client, err := ops.NewClient(&ops.Config{
ConsumerKey: key,
ConsumerSecret: secret,
})
if err != nil {
fmt.Printf("authenticate OPS API: %v", err)
return
}
patents, err := client.Search(context.Background(), "ti=battery", "1-5")
if err != nil {
fmt.Printf("search patents: %v", err)
return
}
fmt.Printf("TotalCount: %d\n", patents.TotalCount)
fmt.Printf("len(.Results): %d\n", len(patents.Results))
for _, patent := range patents.Results {
fmt.Printf("- %s\n", patent.DocNumber)
}
}$ go run main.go
TotalCount: 10000
len(.Results): 0
There's no documentation for what SearchResultData.TotalCount means, although I guess this is just the total results in their database, which the OPS API reports back. Then 10.0001 should be correct for the broad search ti=battery.
However, the actual search results slice patents.Results is empty, which rather should contain exactly 5 items.
XML Output
Replace Client.Search with ops.SearchRaw to skip the parsing step and obtain the XML:
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="../../style/exchange.xsl"?>
<ops:world-patent-data xmlns="http://www.epo.org/exchange" xmlns:ops="http://ops.epo.org" xmlns:xlink="http://www.w3.org/1999/xlink">
<ops:biblio-search total-result-count="10000" publications-count="5">
<ops:query syntax="CQL">ti = battery</ops:query>
<ops:range begin="1" end="5"/>
<ops:search-result>
<ops:publication-reference system="ops.epo.org" family-id="78468024">
<document-id document-id-type="docdb">
<country>ES</country>
<doc-number>3051365</doc-number>
<kind>T3</kind>
</document-id>
</ops:publication-reference>
<ops:publication-reference system="ops.epo.org" family-id="76686681">
<document-id document-id-type="docdb">
<country>ES</country>
<doc-number>3051364</doc-number>
<kind>T3</kind>
</document-id>
</ops:publication-reference>
<ops:publication-reference system="ops.epo.org" family-id="77338870">
<document-id document-id-type="docdb">
<country>AU</country>
<doc-number>2025271196</doc-number>
<kind>A1</kind>
</document-id>
</ops:publication-reference>
<ops:publication-reference system="ops.epo.org" family-id="85412822">
<document-id document-id-type="docdb">
<country>AU</country>
<doc-number>2025271216</doc-number>
<kind>A1</kind>
</document-id>
</ops:publication-reference>
<ops:publication-reference system="ops.epo.org" family-id="74100935">
<document-id document-id-type="docdb">
<country>AU</country>
<doc-number>2025271176</doc-number>
<kind>A1</kind>
</document-id>
</ops:publication-reference>
</ops:search-result>
</ops:biblio-search>
</ops:world-patent-data>Steps to reproduce
Basically just run the above code, but be aware of #1 and my temporary fix for testing: #1 (comment).
Cause
As to my understanding, this is caused by epo_ops.ParseSearch and especially the annotated struct searchXML that it uses to unmarshal the stringified XML.
searchXML seems to be rather made for Client.SearchWithConstituents than Client.Search. Apparently, both methods use ops.ParseSearch and try to unmarshal into searchXML. Although the OPS API will return different XML structures depending on constituents used during search.
Right now, the searchXML expects an XML element named <ops:biblio-search>, although the "normal search" (=without constituents) will return the elements within <ops:search-result>. Subsequently parsed elements are similar, although not identical. So the current implementation of searchXML won't work for all cases.
Also, it seems like there are no test cases that could have catched this bug.
Proposed Fix
A minimal working example is:
Modify
// xml.go
type searchXML struct {
// […] existing fields
// add:
SearchResult struct {
Publications []struct {
System string `xml:"system,attr"`
FamilyID string `xml:"family-id,attr"`
DocumentID struct {
Country string `xml:"country"`
DocNumber string `xml:"doc-number"`
Kind string `xml:"kind"`
} `xml:"document-id"`
} `xml:"publication-reference"`
} `xml:"search-result"`
// […] existing fields
}
func ParseSearch(xmlData string) (*SearchResultData, error) {
// […] after line 1113, add
for _, pub := range raw.BiblioSearch.SearchResult.Publications {
data.Results = append(data.Results, SearchResult{
System: pub.System,
FamilyID: pub.FamilyID,
Country: pub.DocumentID.Country,
DocNumber: pub.DocumentID.DocNumber,
Kind: pub.DocumentID.Kind,
})
}
// […] remaining code
}This accounts for the different XML structure when no constituents are used by adding the nested struct SearchResult xml:"search-result" to searchXML. The explicit "parsing" added to ParseSearch is needed because ParseSearch does NOT return a 1:1 XML-to-struct copy, but actively modifies the structure.
This makes it work, although I do NOT recommend to actually use it this way. I believe the OPS API can return even more different XML structures, depending on the constituents specified.
I can think of two different fixes
- Use
searchXMLfor every possible XML data the search with and without constituents returns. The root elements are similar (identical?) and only nested elements change slightly. Unmarshal tosearchXMLand omit nil values. This would hopefully cover all cases. However,SearchResultDataandSearchResult, which hold the actual properties for each patent, maybe would also need be modified accordingly. - Define different structures for each possible XML structure. Either write multiple
ParseSearchmethods (one for each XML structure) or makeParseSearchaware of this.
Footnotes
-
IIRC from reading the OPS API docs, 10.000 is the maximum of positive search results the OPS API will keep a cursor for and therefore will only report a maximum of 10.000 positive search results ↩