Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#186925787]: Fix pairwise comparison #393

Merged
merged 4 commits into from
Feb 8, 2024

Conversation

slobodan-ilic
Copy link
Contributor

  • Pairwise comparison t-stats to use weighted bases

@slobodan-ilic slobodan-ilic force-pushed the fix-column-bases-186925787 branch 6 times, most recently from e483fcb to d287a8b Compare January 27, 2024 12:54
Copy link

codecov bot commented Jan 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (cfeba1a) 100.00% compared to head (b11e20f) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #393   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           20        20           
  Lines         4620      4621    +1     
=========================================
+ Hits          4620      4621    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ernestoarbitrio ernestoarbitrio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some minor comments

@@ -1545,21 +1545,27 @@ def _column_bases(self):
calculated column bases (either 'effective' or unweighted counts) for
the analysis.
"""
unweighted_blocks = self._second_order_measures.column_unweighted_bases.blocks
# unweighted_blocks = self._second_order_measures.column_unweighted_bases.blocks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get rid of commented code plz :D

Comment on lines 1554 to 1555
# unweighted_blocks[0][0] ** 2 / squared_blocks[0][0],
# unweighted_blocks[0][1] ** 2 / squared_blocks[0][1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

else _SortRowsByMarginalHelper
if collation_method == CM.MARGINAL
else _RowOrderHelper
else (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this is quite difficult to read. Could you think about something more readable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is crap... I dunno man... will try to figure out something better... maybe the switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this better:

    @classmethod
    def factory(cls, cube, dimensions, slice_idx):
        """Return _BaseCubeMeans subclass instance appropriate to `cube`.

        Raises `ValueError` if the cube-result does not include a cube-means measure.
        """
        if cube.means is None:
            raise ValueError("cube-result does not contain cube-means measure")

        def get_cube_means_cls(dim_types):
            if dim_types == (DT.MR, DT.MR):
                return _MrXMrCubeMeans
            if dim_types[0] == DT.MR:
                return _MrXCatCubeMeans
            if dim_types[1] == DT.MR:
                return _CatXMrCubeMeans
            return _CatXCatCubeMeans

        dimension_types = cube.dimension_types[-2:]
        CubeMeansCls = get_cube_means_cls(dimension_types)

        return CubeMeansCls(
            dimensions, cube.means[cls._slice_idx_expr(cube, slice_idx)]
        )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I like it more

],
]
return effective_blocks
return unweighted_blocks
# return unweighted_blocks
return weighted_blocks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the outer return here is what happens in the absence of weighting, or absence of w^2. In the first case (no weights) weighted == unweighted, which would be fine. In the second instance, where you might have weights but fail to supply w^2, unweighted would still be correct here. The bug was only in the effective calculation above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malecki Can you re-check this? I think I figured it out finally.

Copy link
Contributor

@malecki malecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much more reasonable set of changed expectations. Thank you!

@slobodan-ilic slobodan-ilic force-pushed the fix-column-bases-186925787 branch from a5f65ca to f6f1dee Compare February 7, 2024 10:38
@slobodan-ilic slobodan-ilic merged commit 72e864f into master Feb 8, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants