-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Organize outputs directory as nested folders #2296
base: main
Are you sure you want to change the base?
Organize outputs directory as nested folders #2296
Conversation
Can one of the admins verify this patch? |
Thanks! Let me first address one of your points/questions before proceeding:
The function is below. Lines 589 to 598 in f6bc96c
The important bits are lines 591-594. There, we are basically turning def add(a, b):
return a + b into from quacc import change_settings
def add(a, b):
with change_settings(changes):
return a + b This is necessary because So, the quacc/src/quacc/wflow_tools/decorators.py Lines 147 to 148 in f6bc96c
In short, if a user passes
Now we have successfully changed the settings within the the with change_settings({"RESULTS_DIR": "~/test2"}):
with change_settings({"RESULTS_DIR": "~/test1"}):
static_job(atoms) this came up in practice because they redecorated the same function twice without renaming it: static_job = redecorate(static_job, job(settings_swap={"RESULTS_DIR": "~/test1"}))
static_job(atoms)
static_job = redecorate(static_job, job(settings_swap={"RESULTS_DIR": "~/test2"}))
static_job(atoms) this but this is an edge case and a scenario of "the user shouldn't technically do that", so I told them if it caused issues it may be removed. in any case, not much to worry about at the moment. I'll take care of other workflow engines as needed. if tests pass, then we're fine. I added a lot of tests for the settings handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zulissimeta: Thanks for this PR! I will have to try it out, but looking at the code and your writeup, this seems logical to me. I do not have major objections (at this time).
As for this PR, next steps would be:
- Figure out why the tests are failing and update the code accordingly. We have three tests failing in
tests/prefect/test_customizers.py
. - Address the (minor) comments in my review.
- Add tests for this newly introduced behavior so we can ensure we don't break it.
- Extend this to other workflow engines (this is for me to do).
As for your "downsides":
Passing around wrapped functions increases complexity
It is true that wrapping functions increases complexity. That said, we already are wrapping functions once if change_settings_wrap
is applied. Additionally, even without the change_settings
business, we were already wrapping in a few places depending on the workflow engine (e.g. for Prefect when we have PREFECT_AUTO_SUBMIT
as True
). So, here we have either single or double wrapping depending on the workflow engine. That should hopefully be okay because we are already doing that in the test suite as-is. You haven't really increased the complexity from a wrapping standpoint in this PR.
You can only define the task/flow/etc at runtime in the flow, which could lead to some downstream weird logic issues
I don't think I understand this point. Could you clarify?
Prefect throws some warnings that different functions with the same name are being used (it still works)
Is this a new warning, or was it always present? Shouldn't the name of the function not change if we use @wraps
? I'm curious what happened here to cause that metadata change.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2296 +/- ##
==========================================
- Coverage 98.45% 98.35% -0.11%
==========================================
Files 84 84
Lines 3430 3457 +27
==========================================
+ Hits 3377 3400 +23
- Misses 53 57 +4 ☔ View full report in Codecov by Sentry. |
Done! (except for 4.)
After thinking some more, I made it so the change_settings_wrap keeps a reference to the changes, both so that the nested directory wrapper knows when not to set the new directory (if it was manually specified), and so that it can just be added to the other changes.
If you don't have this, you can definitely schedule a bunch of future jobs (say a chain of dependent jobs). With this change, the function actually has to be called with each input in the calling flow. I'm not aware of a concrete problem here, but making it so that the wrapped function 100% has to be called to get it to add the right context manager might lead to problems later. For example, in prefect there's a job.map() that might not explicitly call functions, and instead just schedule them directly on the server.
This is a new warning that will happen for any wrapped function (not just in my PR, but with change_settings too). Wrapped functions are different functions, but have the same name, which prefect is warning about. |
This seems logical. I did some refactoring to reduce code duplication. Before this can be merged, the following will need to be addressed:
I can work on this soon but might be a bit slower than usual, as I'm getting ready to move across the country this weekend. |
Pinging @honghuikim since you may be interested in this PR based on your prior contribution. |
@Andrew-S-Rosen Thanks for the tag. I just skimmed this PR, and it looks great! I like this "nested directory" concept, which will be helpful especially for flows. Hi @zulissimeta, I like the concept of making nested directories, and this quite resembles the purpose of my previous PR (change_settings_wrap). I have one suggestion. Maybe we can try assigning the parent RESULTS_DIR inside def customize_funcs(
names: list[str] | str,
funcs: list[Callable] | Callable,
param_defaults: dict[str, dict[str, Any]] | None = None,
param_swaps: dict[str, dict[str, Any]] | None = None,
decorators: dict[str, Callable | None] | None = None,
) -> tuple[Callable, ...] | Callable:
parameters = recursive_dict_merge(param_defaults, param_swaps)
decorators = decorators or {}
if QUACC_NESTED_RESULTS_DIR:
time_now = datetime.now(timezone.utc).strftime("%Y-%m-%d-%H-%M-%S-%f")
if prefix is None:
prefix = ""
flow_dir_dict = {"all": job(settings_swap={"RESULTS_DIR": Path(f"{prefix}{time_now}-{randint(10000, 99999)}")})}
decorators = recursive_dict_merge(flow_dir_dict , decorators)
# and the rest of the original code By using I might have missed some key points to consider, so this might be oversimplifying the task. Please let me know your thoughts on this. |
Thank you, @honghuikim, for this very interesting suggestion! It would be great if we could have the logic inside something like One small subtlety to keep track of is in the following proposed set of lines: flow_dir_dict = {
"all": job(
settings_swap={
"RESULTS_DIR": Path(f"{prefix}{time_now}-{randint(10000, 99999)}")
}
)
}
decorators = recursive_dict_merge(flow_dir_dict, decorators) If the user passes in Anyway, in full disclosure, I probably can't tackle this in the very short term. But I would be happy to review such an approach, which I think would further improve this PR. |
@Andrew-S-Rosen Thanks a lot for the mini-review of it. That indeed is an error to be solved if we use this approach. My short thought is to use an internally defined hidden key (e.g., named I will also be able to dig into this more later, probably a week later. In the meantime, it would be great if zulissimeta could provide his thoughts on this. Especially regarding potential errors that could arise from using this approach since he has been tackling this. |
I don't have strong feelings on this, and I agree that an approach that is agnostic to the specific workflow would be nice. However, putting this in |
That is correct. However, in the quacc codebase itself, there is a strict requirement that all flows call |
Following up the discussion here to state that packing this logic inside In short, we want to decouple directory handling from any optional behavior. But we do need it to work for all workflow engines. |
I agree, this is correct. Actually, changing the settings of a flow using |
Summary of Changes
This is a draft PR to organize the RESULTS_DIR as a nested directory based on the DAG of the calling functions. For example, running the EMT example in the docs:
with the new setting
QUACC_NESTED_RESULTS_DIR=True
will lead to a RESULTS_DIR with layout:This is only a demo to inspire discussion, and is only implemented for prefect right now (though others would be easy to add!).
A couple of potential downsides: