-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add context manager functionality using hooks #121
base: main
Are you sure you want to change the base?
Add context manager functionality using hooks #121
Conversation
A different (easier/safer) approach to lazy importing using a meta import hook.
Wow, this almost looks too good to be true 👀 What's the catch! Is https://github.com/scientific-python/lazy-loader/pull/121/files#diff-0f92aed2087ad50adf43e527efcd07237adacc5031b7296ae1ece2693462986bR320 a recommended practice for taking over imports? |
certainly does look elegant! I don't have much experience with the practical details of using
Note Note For projects where startup time is critical, this class allows for potentially minimizing the cost of loading a module if it is never used. For projects where startup time is not essential then use of this class is heavily discouraged due to error messages created during loading being postponed and thus occurring out of context. Certainly, none of that is a deal-breaker (and indeed, we're already deep down in the magic here simply by trying to support lazy imports in a type safe way), but perhaps those warnings should be passed along in the readme here along with docs on how to use this feature |
So, adding the finder to
I didn't either. I just saw it inside the |
@stefanv asked me to take a look at this. Planning to review soon, but messaging in advance so you can ping me if I drop the ball. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this approach seems pretty sound. I've done a decent bit of hacking around importlib, but I haven't used the LazyLoader specifically before so I did a bit of testing. I was mainly concerned about whether extension modules or built-ins might use custom loaders, but built-ins do not and the extension modules I tested (built either directly with the Python C API or using pybind11) seemed to be fine. On the restrictions:
This class only works with loaders that define exec_module() as control over what module type is used for the module is required.
The restriction that finders must use exec_module
is basically a statement that this doesn't work with finders written prior to Python 3.4 because exec_module
and create_module
were the replacements introduced then and by default load_module
simply calls those two functions in modern versions of Python. That seems fine.
For those same reasons, the loader’s create_module() method must return None or a type for which its class attribute can be mutated along with not using slots.
Similarly, I don't think any modern loaders will return something with an immutable __class__
.
Finally, modules which substitute the object placed into sys.modules will not work as there is no way to properly replace the module references throughout the interpreter safely; ValueError is raised if such a substitution is detected.
I could see this restriction impacting one of the more common use cases of custom loaders, which is to intercept imports of a specific package. In some of those cases it is reasonable to imagine sys.modules
being rewritten as a way to alias or mask a package's import. However, I still think that is a special enough case that expecting lazy_imports
to work seamlessly with those custom loaders is out of scope.
return spec | ||
|
||
|
||
sys.meta_path.insert(0, LazyFinder()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most meta path finders will be targeted at specific set of modules. In those cases it is probably fine to insert the finder like this because in general if there are multiple finders/loaders affecting the same set of modules and injecting specialized behavior there is no way to guarantee that they can both do so safely anyway. In this case, however, we need the LazyFinder to respect other finders. If a finder is inserted into the meta path after this finder then it could preempt the lazy find. To address that, instead of inserting the finder here we should insert it inside the __enter__
of the context manager and then remove it in __exit__
with something like
with threading.Lock():
sys.meta_path = [f for f in sys.meta_path if not isinstance(finder, LazyFinder)]
We will need to lock meta_path modification (both insertion and deletion) for thread safety. Since the lazy_imports
context is by construction not reentrant it should be fine to not include any logic to handle the object already being in the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not answering earlier, as I was on vacation.
I know that it is possible for another hook to be inserted before this one after this is imported. However, that would only happen if it is explicitly inserted at the beginning of the list, which is not very common.
Although ensuring that we are at the first position on __enter__
seems sensible at first glance, in a multithread context this simply cannot be done safely, AFAIK, because hooks are global and not per-thread, so you would be messing with the meta path for other threads too (and other threads could change it too!).
Your code seems broken. Not only it assumes that all threads would synchronize the meta path using the same lock (impracticable), but if I am not mistaken you are creating a new lock every time, so you are not even protecting anything (see pylint-dev/pylint#5208).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem!
Apologies, my example wasn't intended to be taken verbatim. Concretely, I would suggest storing a lock either at module scope or as a class member of the LazyFinder (e.g. upon first import of the lazy_loader
itself). Then you would have a single instance that is shared across all threads and all import calls. I am confused as to why you think having all threads synchronizing the meta path via the same lock would be impractical. In general, thread-safe code requires that all threads would use the same lock, otherwise the lock serves no purpose. Perhaps I am misunderstanding what you mean? Are you concerned that imports might be occurring in a performance-sensitive context? My assumption is that even if you have dozens/hundreds/more threads, the import shouldn't be on the critical path for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what I mean is that you cannot control what other threads do. In a multithreading context some threads may be created by other libraries or the application. These could in principle access the list while you are checking/changing it, and they won't use your lock. You can only protect by locking the APIs that you define, not the APIs defined elsewhere, unless you are in complete control of all threads of the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, if any other library includes modification of sys.meta_path
beyond simply usage of this context manager in different threads then I agree that we have no control over that. There is nothing that we can do in that case. The question to me is really which case has more risk; allowing the lazy finder to be preempted by different insertion order, or allowing the lazy finder to have possible nondeterministic behavior in a multithreaded scenario where multiple distinct code paths modify sys.meta_path
. I was assuming the latter would be far less likely than the former (granting that the former is also quite unlikely) and was largely operating under the assumption that the lazy loader would be the only code modifying sys.meta_path
in a multithreaded context. OTOH nondeterministic code is a far greater evil and I wasn't really thinking much about it. I'm fine with leaving the implementation as is and just accepting the limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I think we could do mostly safely is to warn if we are not in first position, possibly even listing which other hooks are before ours, so that users are aware of that and do not spend hours to debug it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, worst case with the warning is that we report a false positive if some other libraries is inserted and then removed with bad timing in another thread.
if spec is None: | ||
raise ModuleNotFoundError(f"No module named '{fullname}'") | ||
|
||
spec.loader = importlib.util.LazyLoader(spec.loader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the lazy loader produces a delayed evaluation such that any import errors will show up later than expected, and since the user of the lazy_imports
context may not realize what is happening under the hood, it might make sense to subclass the LazyLoader and do something like
class MyLazyLoader(LazyLoader):
def exec_module(self, module):
try:
super().exec_module(module)
except BaseException as e:
raise ImportError(f"Lazy load of module {module} failed") from e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I in principle agree with improving the user reporting of errors, so this sounds fine to me.
After @vyasr's review (thank you!) that this approach is viable, so should we try to get this out there for people to try? |
Hi, @vnmabus mentioned this on a python discuss thread about deferred (originally async) imports so I thought I'd comment here as I ran into some issues trying it out. It seems like you need to import Code:
Error:
Code for I also see some examples that use (This is with the my_module.py
lazy_demo.py
lazy_demo2.py
Edit: Noticed the initial error was from a different run than the script from the traceback, must have copied from the wrong run. |
@DavidCEllis Thanks very much for taking a look. More generally speaking, given your depth of experience, if you run into any issues with |
There's some other existing prior/parallel art here with the context manager approach. Something that's done in prior/parallel art is declaring which namespaces should be affected. Taken from the readme of another project working on this problem: with defer_imports.install_import_hook(module_names=("discord",), recursive=True):
... This avoids your above issue with needing to be careful about importing something (threading) merely for effect before using this context manager. |
Thanks @DavidCEllis and @mikeshardmind for giving feedback! Apparently the issue with threading is new, due to a recent commit (python/cpython#117983). For solving that, and the "from" issue maybe we should use our own class, to have more control of what is loaded and when. In #70 I had another draft implementation that relied on replacing temporarily the To be honest, I am not an expert on importing internals, so I hoped that one of these PRs was inspirational enough to "nerd-snip" someone with more knowledge that can help finish the implementation successfully. |
I'm not sure you can cleanly handle The When a module is imported the partially initialised module has its (Note that even with By contrast the objects retrieved with a The solution I use myself for lazy imports uses roughly the same mechanics as your Instead of building a function that returns a Does make a noticeable performance improvement in the CLI tools I use it for. Still doesn't play nice with static analysis sadly. Footnotes
|
Wouldn't an hybrid solution be possible, in which the |
I think if you want to prevent any import machinery from happening you might be better served by replacing Trying to match things up does get a bit hairy when you're trying to handle |
Something along these lines might work the way you're suggesting? I'm not sure I like it though. https://gist.github.com/DavidCEllis/29c5e77262f522c26b9f810fcb031a43 |
What I worried with replacing |
Yes, that's one of the reasons I didn't really like it. In the context of multiple threads trying to import simultaneously you'll run into issues. Even with path hooks, something else could insert a path hook and break the process though. Defining the imports separately either in a stub or in a dict for attach is probably still the safest way, but I'll have another look to see if there's a way to use path hooks. Edit: I think it takes some of the tools you'd need to do this out of your hands, but I'm not certain yet. |
The docs for the import system state (emphasis mine):
However I haven't actually found a way to make it so the import statement is only replaced in one module, maybe I'm missing something or that's not what this sentence is supposed to imply. I did make a handful of changes to mitigate some of the issues anyway. Imports that occur in other modules while the replacement If |
This is an alternative (simpler and safer) implementation of #70 using a meta hook. By using this approach, typing should work without the need of repeating code or using typing stubs (it would allow to remove most of the code in this package, unless there is some problem that I did not notice).
The syntax used to define the lazy imports would be the same as #70:
This is currently a draft PR because I wanted to receive feedback and do more tests. It should work with any kind of import.
The code should be quite easy to understand and extend if necessary.
One think to take into account: I did NOT implement
error_on_import=False
with this syntax. I think that it would complicate the implementation and I see no point in delaying to report theImportError
to the user.@tlambert03, @stefanv what are your thoughts on this?