Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability to handle tmd input #2740

Merged
merged 34 commits into from
May 7, 2024
Merged

Conversation

martinholmer
Copy link
Collaborator

@martinholmer martinholmer commented Apr 18, 2024

Add code to handle new private tmd.csv in a manner like the private puf.csv is handled.

The private tmd.csv file contains 2021 data and has its own public tmd_weights.csv.gz file containing weights for 2021 through 2074. Also, the new public growfactors.csv file has factors extending through 2074.

These three files are created in the tax-microdata repository using the following scripts:

  • create_taxcalc_input_variables.py produces the private tmd.csv file
  • create_taxcalc_sampling_weights.py produces the public tmd_weights.csv.gz file
  • create_taxcalc_growth_factors.py produces the public growfactors.csv file

The code in this pull request still specifies the Policy.LAST_BUDGET_YEAR to be 2034; a subsequent pull request will increase it to 2074 (which has already been tested).

@martinholmer martinholmer marked this pull request as draft April 18, 2024 20:34
@martinholmer martinholmer changed the title Add capability to handle tmd.csv input file Add capability to handle tmd input file Apr 18, 2024
@martinholmer martinholmer changed the title Add capability to handle tmd input file Add capability to handle tmd input Apr 18, 2024
Copy link

codecov bot commented Apr 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.42%. Comparing base (c339245) to head (6a869c6).
Report is 1 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2740   +/-   ##
=======================================
  Coverage   99.42%   99.42%           
=======================================
  Files          13       13           
  Lines        2594     2603    +9     
=======================================
+ Hits         2579     2588    +9     
  Misses         15       15           
Flag Coverage Δ
unittests 99.42% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
taxcalc/data.py 99.24% <ø> (ø)
taxcalc/growdiff.py 100.00% <100.00%> (ø)
taxcalc/growfactors.py 100.00% <100.00%> (ø)
taxcalc/records.py 100.00% <100.00%> (ø)
taxcalc/taxcalcio.py 100.00% <100.00%> (ø)

@martinholmer martinholmer requested a review from jdebacker May 2, 2024 23:36
@martinholmer martinholmer reopened this May 5, 2024
@martinholmer martinholmer marked this pull request as ready for review May 6, 2024 02:01
- "pandas>=1.2.0"
- "bokeh>=1.4.0, <3.0.0"
- "python>=3.9, <3.12"
- "numpy>=1.20, <2.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important to provide an upper bound on the versions for certain packages, like numpy and pandas? Historically, Tax-Calculator has tried to remain compatible with the most recent versions of the dependencies.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several deprecation warnings in the code (mostly related to pandas) that say the current code will cause errrors when pandas 3.0 is used. I check the pandas web site and they said 3.0 would be released any day now.

So, the limits are temporary. They will be removed when the deprecated code gets fixed.

@jdebacker
Copy link
Member

@martinholmer I've reviewed this PR and had one small question left in my review.

But there is a larger issue that this PR touches on that might be worth discussion before merging. There has been work in the past (see PR #2538) to disentangle the tax calculator logic from the data used for specific calculations. Things like the cps_constructor method and the default use of the puf.csv are a convenience, but come at a cost of making implicit assumptions about the quality of certain data files that are outside the scope of Tax-Calculator. This PR adds additional methods for convenience with a(nother) particular data file. I can see the advantage of these conveniences, but on the other hand it does push in the opposite direction that the project was hoping to go. cc @MattHJensen thoughts?

@martinholmer
Copy link
Collaborator Author

@jdebacker said in PR #2740:

But there is a larger issue that this PR touches on that might be worth discussion before merging. There has been work in the past (see PR #2538) to disentangle the tax calculator logic from the data used for specific calculations. Things like the cps_constructor method and the default use of the puf.csv are a convenience, but come at a cost of making implicit assumptions about the quality of certain data files that are outside the scope of Tax-Calculator. This PR adds additional methods for convenience with a(nother) particular data file. I can see the advantage of these conveniences, but on the other hand it does push in the opposite direction that the project was hoping to go.

There has been no work on PR #2538 in over three years. Seems like there is little or no momentum in the "direction that the project was hoping to go".

@jdebacker
Copy link
Member

Thanks of the PR @martinholmer. Merging.

@jdebacker jdebacker merged commit e899e7a into PSLmodels:master May 7, 2024
9 checks passed
@martinholmer martinholmer deleted the add-tmd branch May 8, 2024 12:03
@martinholmer martinholmer mentioned this pull request May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants