Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy sources #82

Closed
wants to merge 15 commits into from
Closed

Lazy sources #82

wants to merge 15 commits into from

Conversation

beasteers
Copy link
Contributor

This implements a discussion had in #80.

Basically, ConfigSource control the entire concept of data loading and state, so you can have partially loaded datasets and only fall back on lower priority sources when the keys are missing.

So you can do:

import confuse
config = confuse.Configuration('mypkg', source=[
    confuse.EnvSource('MYPKG'),
    '~/mypkg',
    confuse.YamlSource('/etc/mypkg-settings.yml'),
])
config.sources

outputs

[EnvSource({}, prefix=MYPKG, separator=__),
 YamlSource({'a': 5}, filename=/Users/bsteers/mypkg/config.yaml, default=False),
 YamlSource([Source doesn't exist], filename=/etc/mypkg-settings.yml, default=False),

 YamlSource([Source doesn't exist], filename=/Users/bsteers/.config/mypkg/config.yaml, default=False)]

to demonstrate ConfigSource more directly (pulled from the other PR):

# equivalent
c = ConfigSource.of('asdf.yml')
c = YamlSource('asdf.yml')
# lazy loading
c = ConfigSource.of('asdfa.yml')
print('loaded?', c.loaded, c)
print('a =', c['a'])
print('loaded?', c.loaded, c)

outputs

loaded? False ConfigSource({}, asdfa.yml, False)
a = 5
loaded? True ConfigSource({'a': 5, 'b': 6, 'file': 'asdfa.yml'}, asdfa.yml, False)

@beasteers
Copy link
Contributor Author

beasteers commented Apr 24, 2020

Things to fix:

  • python2 functools.wraps doesn't work for methods - hmmm might need to update func name/doc manually or something
  • flake "prettiness" errors
  • add configsource tests
  • For some reason, in a very specific scenario (tox nosetests w/ python 2.7), calling dict(source) doesn't call dict.keys as expected. From what I can tell, in that specific case, it seems to be calling dict.keys and not the overridden ConfigSource.keys (where we call load).
# python 2.7
# runs fine
nosetests tests/test-sources.py
# throws error
tox tests/test-sources.py

I spent an hour this morning trying to pin this down, but I still can't figure out how the problem is arising. It's not exclusively a py2.7 problem, because I can get it working in a py2 environment using nosetests/pytest

This shouldn't ever be an issue in production tho because I changed dict(source) => dict(source.load()) in resolve to make sure this case is covered, but it's just a weird issue that's bugging me.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems cool so far; thanks!! I jotted a few low-level questions inline.

I realize that this is a lot to ask, but is there any chance I could get your help splitting this up into one feature at a time? I would love to do the lazy source thing first and make sure that causes no problems with existing apps that use LazyConfig (i.e., hopefully it works exactly the same way as the old mechanism for laziness). Then we can merge the more convenient Config construction thing next, based on that. Would that make sense?



def _load_first(func):
'''Call self.load() before the function is called - used for lazy source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we stick with the PEP 8-recommended triple-double quotes for docstrings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure. I don't like the style of unnecessary double quotes when single will do, but since it's pep8, ok.

try:
return wraps(func)(inner)
except AttributeError:
# in v2 they don't ignore missing attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a little more context: what does the AttributeError on Python 2 look like?

Copy link
Contributor Author

@beasteers beasteers Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traceback (most recent call last):
  File "/home/travis/build/beetbox/confuse/example/__init__.py", line 3, in <module>
    import confuse
  File "/home/travis/build/beetbox/confuse/confuse.py", line 157, in <module>
    class ConfigSource(dict):
  File "/home/travis/build/beetbox/confuse/confuse.py", line 213, in ConfigSource
    __getitem__ = _load_first(dict.__getitem__)
  File "/home/travis/build/beetbox/confuse/confuse.py", line 145, in _load_first
    @wraps(func)
  File "/opt/python/2.7.15/lib/python2.7/functools.py", line 33, in update_wrapper
    setattr(wrapper, attr, getattr(wrapped, attr))
AttributeError: 'method_descriptor' object has no attribute '__module__'

In py3, they skip missing attributes in functools.wraps.

self.filename, self.default)

@property
def exists(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about why this is necessary—shouldn't it be an error to try using a source that doesn't have a file that exists?

Copy link
Contributor Author

@beasteers beasteers Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of a source that doesn't exist comes from the desire to create a config dir that doesn't exist.

The full list of sources is created without touching the filesystem, and then on read, it will go thru each source and do any file loading and missing directory creation. If we don't want to be creating a config directory, then we can throw away sources who's file doesn't exist.

As stated elsewhere - by default yaml files will throw an error if they don't exist.

this abstraction was to let other sources (EnvSource, ConfigSource({}), CustomSource) define whether they should be considered existing or not. I initially added this for a more informative __repr__, but it may be useful elsewhere

'''Ensure that the source is loaded.'''
if not self.loaded:
self.config_dir()
self.loaded = self._load() is not False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I'm a tad confused about why this could ever be false… shouldn't load either always succeed or throw an exception? It seems like things could get pretty confusing if sources are allowed to silently fail.

Or if the idea is that missing files should be loadable and just silently produce no values (i.e., an empty dict), that's cool too… but perhaps that should be optional? (And it should still mark the source as "loaded" so that we don't keep trying?)

Copy link
Contributor Author

@beasteers beasteers Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a mechanism to allow a source subclass to signify that a source wasn't loaded and to try again next time. It's only if a _load function explicitly returns False.

It seems like things could get pretty confusing if sources are allowed to silently fail.

Whether they are silently ignored is optional and is disabled by default.

(YamlSource(path, optional=True) will ignore a source if it's missing, YamlSource(path) will throw an error like normal).

I added it in because that's how the user config is handled (the source is kept so it will create the config dir on read).

the idea is that missing files should be loadable and just silently produce no values

yes If a user wants an optional source, they can set optional=True

it should still mark the source as "loaded" so that we don't keep trying?

The reason for the retry mechanism is to allow a source to be loaded whenever it becomes available. As a general mechanism, it's really flexible and powerful, but I agree that allowing retry and delayed loading could have unexpected results in some situations so it should probably be disabled by default.

a `ConfigSource` object. This lets a function accept either type
of object as an argument.
"""
def is_of_type(cls, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism for automatically detecting which type of source to use is a little too "magical," I think, for a first cut. If it's OK with you, can we leave this out for now? (Concretely, an issue with searching for all subclasses of a given class is that it is implicitly extensible—anyone who creates a subclass of ConfigSource will automatically/transparently be extending this overload system, which can be somewhat surprising.)

Perhaps as a more convenient alternative to explicitly using YamlSource we can just hard-code some API to use this that class exclusively?

Copy link
Contributor Author

@beasteers beasteers Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I tend to agree - to be more explicit, we can just mimic as_template.

Currently, by default any class extending ConfigSource would have to extend is_of_type in order to use source inference so there isn't as much risk.

My bigger concern is that resolution order is (?) undefined (?) or at least not immediately apparent.

Tbh it was just an easy way to try to have ConfigSource be externally extensible without modifying the confuse core, but I agree that it isn't optimal and I'm fine with looking at an alternative method.

self.update(self._loader(self.filename))


class EnvSource(ConfigSource):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty cool! But it's a separate feature—maybe we can discuss it separately? (We can also look into writing some documentation for the new variable-based route to configuration…)

Copy link
Contributor Author

@beasteers beasteers Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this can be added in a separate PR. it was just a simple demonstration of source extensibility.


def resolve(self):
return ((dict(s), s) for s in self.sources)
return ((dict(s.load()), s) for s in self.sources)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps dict(s) on a source should implicitly call load? (In fact, maybe it already does?)

Copy link
Contributor Author

@beasteers beasteers Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it totally does - and all tests work fine locally, except in tox/travis python 2.7 tests (but not python 2 w/ pytest/nosetests)...

I'm hoping to get rid of this tho. See comment below

self.sources.append(ConfigSource.of(obj))
src = ConfigSource.of(obj)
self.sources.append(src)
return src
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like add and set (below) both now return the newly-created configuration source. That seems cool enough, but can you perhaps elaborate a bit on the use case you had in mind? In a way it doesn't seem that hard for clients to keep a reference to their own source that they pass in, if that's wha they want. (Or maybe this is used somewhere else internally?) In any case, it would be great to add docs for this new behavior too if we need it…

Copy link
Contributor Author

@beasteers beasteers Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was more for sources that are implicitly created internally using ConfigSource.of:

config.add({'a': 5}) # calls ConfigSource.of(value)

People can access via config.sources[i], but I figured it's less error prone and there isn't really harm to returning it. It only came up because a test was trying to reference by index which is currently dependent on the existence of files.

@sampsyo
Copy link
Member

sampsyo commented Apr 26, 2020

Ah, I see belatedly that you already commented on that extra dict(...) thing! Oops. That's super weird. My best guess is that the dict constructor might just iterate over the keys, which would call __iter__ but not __keys__? Pretty strange…

@beasteers
Copy link
Contributor Author

beasteers commented Apr 26, 2020

Ah, I see belatedly that you already commented on that extra dict(...) thing! Oops. That's super weird. My best guess is that the dict constructor might just iterate over the keys, which would call iter but not keys? Pretty strange…

But I'm wrapping __getitem__, __iter__, keys, values....... so it should load prior to all of those

I tried printing out in __getattribute__ and:

in the passing tests, it prints out: keys, keys, loaded, load, etc.
but in the failed test, it only prints out: keys

So that implies that it's just calling dict.keys and not our overridden method - which is really weird and idk why it'd be doing that.


edit: from what I can tell the problem seems to be that it's getting .keys, but not calling it.

Adding this caused the test to pass

class ConfigSource(dict):
    def __getattribute__(self, k):
        x = super(ConfigSource, self).__getattribute__(k)
        if k == 'keys':
            print(k, x())
        return x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants