lucene - Is Solr phrase slop order dependent or not? -
using default example docs , schema comes solr 4.7.0 (the 1 ipods etc.).
a query phrase slop such as:
http://localhost:8983/solr/collection1/select?wt=json&q=features:%22car%20white%22~4&fl=id,features&omitheader=true
gives me 2 matching documents:
{ "response":{"numfound":2,"start":0,"docs":[ { "id":"f8v7067-apl-kit", "features":["car power adapter, white"]}, { "id":"iw-02", "features":["car power adapter ipod, white"]}] }}
if change "car white" "white car" using same slop value of 4 first document in result. looking @ explain in browse, document returned in both queries says:
(match) weight(features:"white car"~4 in 3)
for other document, in first case says ..."car white"~4 in 4)
changing order "white car" not match document.
this seems imply it's "somewhat" order dependent... not really? can explain going on here?
when swap words, edit distance increases. swap, essentially, adds 2 edit distance (since first edit moves words on top of 1 another).
with query "car white" have
- "car power adapter, white" - distance = 2 (2 words)
- "car power adapter ipod, white" - distance = 4 (4 words)
with "white car" have:
- "car power adapter, white" - distance = 4 (1 swap, 2 words)
- "car power adapter ipod, white" - distance = 6 (1 swap, 4 words)
since slop set 4 in query, last result has high edit distance, , not appear. phrasequery.setslop()
documents behavior of phrase slop, further reading.
Comments
Post a Comment