-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confirm results are reproducible #19
Comments
My laptop:
Results:
EB Ubuntu Dev Env:
Results:
NOTE: The difference between the test runs on my two machines and that of @joewiz appears to be in the number of
|
Desktop
Results
|
Desktop
|
My iMac
Results
This matches Adam's observation that the Failures alternate between 5166 and 5168, while all other non-Time values remain constant. The variation in Failures for all of the reported results in this issue is always 0, 1, or 2.... |
In eXist-db/exist#3966 (comment) I performed a similar batch of runs of exist-xqts-runner. In the 3 runs for that PR, I saw different results within the 3 PR runs, similar to the +/- 0-3 differences we saw here. In test 1 of the PR, there were 5,162 failures, but in test 2 a few minutes later, there were 3 fewer failures - 5,159 failures. The 3 differences all occurred within the tests for regular expressions in the fn:matches function: re00062 <test-case name="re00062">
<description>Test regex syntax</description>
<created by="Michael Kay" on="2011-07-04"/>
<test>(every $s in tokenize('', ',') satisfies matches($s, '^(?:[^\p{IsBasicLatin}]*)$')) and (every $s in tokenize('a', ',') satisfies not(matches($s, '^(?:[^\p{IsBasicLatin}]*)$')))</test>
<result>
<assert-true/>
</result>
</test-case> Failure in 1st test that passed in 2nd test:
Stacktrace:
re00225 <test-case name="re00225">
<description>Test regex syntax</description>
<created by="Michael Kay" on="2011-07-04"/>
<test>(every $s in tokenize('؀ۿ,؀؁؂؃؄؅؆؇؈؉؊؋،؍؎؏ؘؙؚؐؑؒؓؔؕؖؗ؛؜؝؞؟ؠءآأؤإئابةتثجحخدذرزسشصضطظعغػؼؽؾؿـفقكلمنهوىيًٌٍَُِّْٕٖٜٟٓٔٗ٘ٙٚٛٝٞ٠١٢٣٤٥٦٧٨٩٪٫٬٭ٮٯٰٱٲٳٴٵٶٷٸٹٺٻټٽپٿڀځڂڃڄڅچڇڈډڊڋڌڍڎڏڐڑڒړڔڕږڗژڙښڛڜڝڞڟڠڡڢڣڤڥڦڧڨکڪګڬڭڮگڰڱڲڳڴڵڶڷڸڹںڻڼڽھڿۀہۂۃۄۅۆۇۈۉۊۋیۍێۏېۑےۓ۔ەۖۗۘۙۚۛۜ۝۞ۣ۟۠ۡۢۤۥۦۧۨ۩۪ۭ۫۬ۮۯ۰۱۲۳۴۵۶۷۸۹ۺۻۼ۽۾ۿ', ',') satisfies matches($s, '^(?:\p{IsArabic}+)$')) and (every $s in tokenize('', ',') satisfies not(matches($s, '^(?:\p{IsArabic}+)$')))</test>
<result>
<assert-true/>
</result>
</test-case> Failure in 1st test that passed in 2nd test:
Stacktrace:
re00061 <test-case name="re00061">
<description>Test regex syntax</description>
<created by="Michael Kay" on="2011-07-04"/>
<test>(every $s in tokenize('Ā', ',') satisfies matches($s, '^(?:[^\p{IsBasicLatin}]+)$')) and (every $s in tokenize('', ',') satisfies not(matches($s, '^(?:[^\p{IsBasicLatin}]+)$')))</test>
<result>
<assert-true/>
</result>
</test-case> Failure in 1st test that passed in 2nd test:
Stacktrace:
I can't speculate why two runs of exist-xqts-runner run a few minutes apart would produce Tests 2 vs. 3 differed only by 1 test, and this one was in a different location: group-015 <test-case name="group-015">
<description>No value comparisons are available to compare the grouping keys.</description>
<created by="Josh Spiegel" on="2012-10-02"/>
<modified by="Michael Kay" on="2017-03-17" change="avoid assert-xml for non-XML results"/>
<test>
for $x in (true(), "true", xs:QName("true"))
group by $x
return $x
</test>
<result>
<assert-permutation>true(), "true", xs:QName("true")</assert-permutation>
</result>
</test-case> The failure in test 3 that passed in test 2:
Stacktrace:
Comparing test 1 to test 3, all 4 of the exact same differences above were the causes of the differences. This would explain the consistent range of variation of 0-3 in the results that we all reported:
Note that we didn't see a variation of 4—for the case where 1 test failed both the 1 This is just a running theory. Perhaps there are other tests that fail besides these, and only additional runs and comparisons would reveal them. To check which testsuites were responsible for the difference between 2 test runs, save the xquery version "3.1";
let $tss1 := doc("/db/apps/exist-xqts-results/data/5.4.0-SNAPSHOT-with-Juri-PR/test01/junit/data/TESTS-TestSuites.xml")/testsuites/testsuite
let $tss2 := doc("/db/apps/exist-xqts-results/data/5.4.0-SNAPSHOT-with-Juri-PR/test03/junit/data/TESTS-TestSuites.xml")/testsuites/testsuite
return
array {
for $ts1 in $tss1
let $ts1-failures := $ts1/@failures
let $ts2 := $tss2[@package eq $ts1/@package and @name eq $ts1/@name]
let $ts2-failures := $ts2/@failures
return
if ($ts1-failures ne $ts2-failures) then
map {
"package": $ts1/@package/string(),
"name": $ts1/@name/string(),
"ts1-failures": $ts1/@failures cast as xs:integer,
"ts2-failures": $ts2/@failures cast as xs:integer
}
else
()
} This returns a result like: [
{
"package": "XQTS_HEAD.fn-matches",
"name": "re",
"ts1-failures": 7,
"ts2-failures": 4
},
{
"package": "XQTS_HEAD",
"name": "prod-GroupByClause",
"ts1-failures": 15,
"ts2-failures": 16
}
] To derive the table like the one I posted in the PR comment linked above, which listed the tests that returned different results in 2 test runs, I uploaded the entire junit directories to eXist and ran the following query: xquery version "3.1";
declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "html5";
declare option output:media-type "text/html";
declare function local:compare-testcase($testcase-1, $testcase-2) {
element tr {
element td { $testcase-1/../@package/string() },
element td { $testcase-1/../@name/string() },
element td { $testcase-1/@name/string() },
element td { ($testcase-1/*/name(), "pass")[. ne ""][1] },
element td { ($testcase-2/*/name(), "pass")[. ne ""][1] }
}
};
declare function local:compare-testcases($testcases-1, $testcases-2) {
for $tc1 in $testcases-1
let $name := $tc1/@name
let $tc2 := $testcases-2[@name eq $name]
order by $name
return
if (
(empty($tc1/node()) and empty($tc2/node()))
or
($tc1/error and $tc2/error)
or
($tc1/failure and $tc2/failure)
or
($tc1/skipped and $tc2/skipped)
) then
()
else
local:compare-testcase($tc1, $tc2)
};
declare function local:compare-testsuites($testsuites-1, $testsuites-2) {
element table {
element thead {
element tr {
element th { "testsuite package" },
element th { "testsuite name" },
element th { "testcase name" },
element th { "test 1" },
element th { "test 2" }
}
},
element tbody {
for $ts1 in $testsuites-1
let $package := $ts1/@package
let $name := $ts1/@name
let $ts2 := $testsuites-2[@package eq $package and @name eq $name]
order by $package, $name
return
if ($ts1/@errors eq "0" and $ts2/@errors eq "0") then
()
else
local:compare-testcases($ts1/testcase, $ts2/testcase)
}
}
};
let $data-collection := "/db/apps/exist-xqts-results/data"
let $testsuites-1 :=
doc($data-collection || "/5.4.0-SNAPSHOT-before-Juri-PR/test01/junit/data/TESTS-TestSuites.xml")/testsuites/testsuite
let $testsuites-2 :=
doc($data-collection || "/5.4.0-SNAPSHOT-with-Juri-PR/test01/junit/data/TESTS-TestSuites.xml")/testsuites/testsuite
return
local:compare-testsuites($testsuites-1, $testsuites-2) ... returns a table like this:
I hope these results and queries help us nail down the sources of unexpected variation in the results of exist-xqts-runner. |
impressive research and analysis ! |
Order variations in results sound like... usage of a Hashmap somewhere. |
On yesterday's Community Call we discussed evidence indicating users obtained different results despite using identical versions of the exist-xqts-runner and command line flags. I propose we use the latest directions derived from #17 and gather results from as many users as possible:
git clone https://github.com/exist-db/exist-xqts-runner.git
(ensure a fresh clone of the master branch, no local modifications)cd exist-xqts-runner
sbt assembly
target/scala-2.13/exist-xqts-runner-assembly-1.0.0.jar -x HEAD
target/junit/html/index.html
and copy and paste the "Summary" table into your reply to this issue. GitHub is smart enough to transform your HTML into GFM, no fiddling needed. For example, here are my results:The text was updated successfully, but these errors were encountered: