Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Haystack should not configure root logger handlers #8681

Open
1 task done
CSRessel opened this issue Jan 3, 2025 · 0 comments
Open
1 task done

Haystack should not configure root logger handlers #8681

CSRessel opened this issue Jan 3, 2025 · 0 comments
Labels
P2 Medium priority, add to the next sprint if no P1 available

Comments

@CSRessel
Copy link

CSRessel commented Jan 3, 2025

Describe the bug
Any application that imports this library cannot expect their own configuration of the Python root logger to be respected, because this library adds to the root logger's list of handlers.

This issue occurred previously in #2485 and #4202

Expected behavior
An application using this library should be able to import haystack and then use logging.basicConfig() as normal.

Additional context
This issue was introduced here

This is an issue because logging.basicConfig() is ignored once any handlers are configured. At a bare minimum, it is reasonable to expect all libraries make no modifications to the root handler. The quickest fix is to edit line 89 so as to only add the handler onto the subloggers that will be used throughout the library:

    haystack_logger = logging.getLogger("haystack")  # only add our handler within our library's hierarchy

    # avoid adding our handler twice
    old_handlers = [
        h for h in haystack_logger.handlers
        if (isinstance(h, logging.StreamHandler) and h.name == "HaystackLoggingHandler")
    ]
    for old_handler in old_handlers:
        haystack_logger.removeHandler(old_handler)

    haystack_logger.addHandler(handler)

    # or more succinctly, only add if not already present
    # if not old_handlers:
    #     haystack_logger.addHandler(handler)

However, it is also generally expected that the application and not the library is the arbiter of all log handlers, as recommended in the python docs' Logging Cookbook. This would mean it is unusual for any library to implicitly add a log handler -- it is the application developer who knows best what log formats they need.

I agree that providing recommended overrides can be very convenient; one route would be to export a factory for the provided handler so that the consuming application can easily opt-in to this feature:

from haystack.logging import configure_logging_handler  # function to create the HaystackLoggingHandler
logging.getLogger().addHandler(configure_logging_handler())  # app dev can choose to add at the root, at the haystack level, or not at all

Quick blog post summary of developer expectations on this topic: http://rednafi.com/python/no_hijack_root_logger/

To Reproduce
Minimal repro:

from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore

import logging
import pandas as pd
logging.basicConfig(level=logging.CRITICAL)

document_store = InMemoryDocumentStore()
document_store.write_documents([
    # still prints a warning, because of the logging.getLogger().root changes within haystack
    Document(
        content="My name is Jean and I live in Paris.",
        dataframe=pd.DataFrame({"name": ["Jean"], "city": ["Paris"]}),
    ),
])

FAQ Check

System:

  • OS: Ubuntu 22.04
  • GPU/CPU: N/A
  • Haystack version (commit or version number): 2.7.0 in my testing, up to present
  • DocumentStore: N/A
  • Reader: N/A
  • Retriever: N/A
@julian-risch julian-risch added the P2 Medium priority, add to the next sprint if no P1 available label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Medium priority, add to the next sprint if no P1 available
Projects
None yet
Development

No branches or pull requests

2 participants