From c4ce7837d066e9b7f1c96fa694fdbed7cee5b41d Mon Sep 17 00:00:00 2001 From: Jonas Eschle Date: Thu, 11 Apr 2024 23:21:45 -0400 Subject: [PATCH] fix: intro long --- .../introduction/Introduction_long.ipynb | 169 ++---------------- introduction/Introduction_long.ipynb | 169 ++---------------- 2 files changed, 28 insertions(+), 310 deletions(-) diff --git a/_website/tutorials/introduction/Introduction_long.ipynb b/_website/tutorials/introduction/Introduction_long.ipynb index d75d030..8b6f800 100644 --- a/_website/tutorials/introduction/Introduction_long.ipynb +++ b/_website/tutorials/introduction/Introduction_long.ipynb @@ -186,146 +186,6 @@ "data_normal.n_obs" ] }, - { - "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "## Data\n", - "\n", - "This component in general plays a minor role in zfit: it is mostly to provide a unified interface for data.\n", - "\n", - "Preprocessing is therefore not part of zfit and should be done beforehand. Python offers many great possibilities to do so (e.g. Pandas).\n", - "\n", - "zfit `Data` can load data from various sources, most notably from Numpy, Pandas DataFrame, TensorFlow Tensor and ROOT (using uproot). It is also possible, for convenience, to convert it directly `to_pandas`. The constructors are named `from_numpy`, `from_root` etc." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "import zfit\n", - "# znp is a subset of numpy functions with a numpy interface but using actually the zfit backend (currently TF)\n", - "import zfit.z.numpy as znp\n", - "from zfit import z" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "A `Data` needs not only the data itself but also the observables: the human readable string identifiers of the axes (corresponding to \"columns\" of a Pandas DataFrame). It is convenient to define the `Space` not only with the observable but also with a limit: this can directly be re-used as the normalization range in the PDF.\n", - "\n", - "First, let's define our observables" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "obs = zfit.Space('obs1', (-5, 10))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "This `Space` has limits. Next to the effect of handling the observables, we can also play with the limits: multiple `Spaces` can be added to provide disconnected ranges. More importantly, `Space` offers functionality:\n", - "- limit1d: return the lower and upper limit in the 1 dimensional case (raises an error otherwise)\n", - "- rect_limits: return the n dimensional limits\n", - "- area(): calculate the area (e.g. distance between upper and lower)\n", - "- inside(): return a boolean Tensor corresponding to whether the value is _inside_ the `Space`\n", - "- filter(): filter the input values to only return the one inside" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "size_normal = 10000\n", - "data_normal_np = np.random.normal(size=size_normal, scale=2)\n", - "\n", - "data_normal = zfit.Data.from_numpy(obs=obs, array=data_normal_np)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "The main functionality is\n", - "- nevents: attribute that returns the number of events in the object\n", - "- data_range: a `Space` that defines the limits of the data; if outside, the data will be cut\n", - "- n_obs: defines the number of dimensions in the dataset\n", - "- with_obs: returns a subset of the dataset with only the given obs\n", - "- weights: event based weights\n", - "\n", - "Furthermore, `value` returns a Tensor with shape `(nevents, n_obs)`.\n", - "\n", - "To retrieve values, in general `z.unstack_x(data)` should be used; this returns a single Tensor with shape (nevents) or a list of tensors if `n_obs` is larger then 1." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "print(\n", - " f\"We have {data_normal.nevents} events in our dataset with the minimum of {np.min(data_normal.unstack_x())}\") # remember! The obs cut out some of the data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "data_normal.n_obs" - ] - }, { "cell_type": "markdown", "metadata": { @@ -362,7 +222,7 @@ "A `Parameter` (there are different kinds actually, more on that later) takes the following arguments as input:\n", "`Parameter(human readable name, initial value[, lower limit, upper limit])` where the limits are recommended but not mandatory. Furthermore, `step_size` can be given (which is useful to be around the given uncertainty, e.g. for large yields or small values it can help a lot to set this). Also, a `floating` argument is supported, indicating whether the parameter is allowed to float in the fit or not (just omitting the limits does _not_ make a parameter constant).\n", "\n", - "Parameters have a unique name. This is served as the identifier for e.g. fit results. However, a parameter _cannot_ be retrieved by its string identifier (its name) but the object itself should be used. In places where a parameter maps to something, the object itself is needed, not its name." + "The name of the parameter identifies it; therefore, while multiple parameters with the same name can exist, they cannot exist inside the same model/loss/function, as they would be ambiguous." ] }, { @@ -376,6 +236,7 @@ "outputs": [], "source": [ "mu = zfit.Parameter('mu', 1, -3, 3, step_size=0.2)\n", + "another_mu = zfit.Parameter('mu', 2, -3, 3, step_size=0.2)\n", "sigma_num = zfit.Parameter('sigma42', 1, 0.1, 10, floating=False)" ] }, @@ -412,10 +273,7 @@ "name": "#%% md\n" } }, - "source": [ - "*PITFALL NOTEBOOKS: since the parameters have a unique name, a second parameter with the same name cannot be created; the behavior is undefined and therefore it raises an error.\n", - "While this does not pose a problem in a normal Python script, it does in a Jupyter-like notebook, since it is an often practice to \"rerun\" a cell as an attempt to \"reset\" things. Bear in mind that this does not make sense, from a logic point of view. The parameter already exists. Best practice: write a small wrapper, do not rerun the parameter creation cell or simply rerun the notebook (restart kernel & run all). For further details, have a look at the discussion and arguments [here](https://github.com/zfit/zfit/issues/186)*" - ] + "source": [] }, { "cell_type": "markdown", @@ -779,7 +637,7 @@ "\n", " nbins = 50\n", "\n", - " lower, upper = data.v1.limits\n", + " lower, upper = data.space.v1.limits\n", " x = znp.linspace(lower, upper, num=1000) # np.linspace also works\n", " y = model.pdf(x) * size_normal / nbins * data.data_range.area()\n", " y *= scale\n", @@ -863,7 +721,7 @@ }, "outputs": [], "source": [ - "mass_obs = zfit.Space('mass', (0, 1000))" + "mass_obs = zfit.Space('mass', 0, 1000)" ] }, { @@ -897,7 +755,7 @@ "source": [ "# combinatorial background\n", "\n", - "lam = zfit.Parameter('lambda', -0.01, -0.05, -0.001)\n", + "lam = zfit.Parameter('lambda', -0.01, -0.05, -0.00001)\n", "comb_bkg = zfit.pdf.Exponential(lam, obs=mass_obs)" ] }, @@ -1426,11 +1284,12 @@ "outputs": [], "source": [ "values = z.unstack_x(data)\n", - "obs_right_tail = zfit.Space('mass', (700, 1000))\n", + "obs_right_tail = zfit.Space('mass', (550, 1000))\n", "data_tail = zfit.Data.from_tensor(obs=obs_right_tail, tensor=values)\n", - "with comb_bkg.set_norm_range(obs_right_tail):\n", - " nll_tail = zfit.loss.UnbinnedNLL(comb_bkg, data_tail)\n", - " minimizer.minimize(nll_tail)" + "comb_bkg_right = comb_bkg.to_truncated(limits=obs_right_tail) # this gets the normalization right\n", + "nll_tail = zfit.loss.UnbinnedNLL(comb_bkg_right, data_tail)\n", + "result_sideband = minimizer.minimize(nll_tail)\n", + "print(result_sideband)" ] }, { @@ -1617,7 +1476,7 @@ }, "outputs": [], "source": [ - "result.hesse(method='minuit_hesse', name='hesse')" + "result.hesse(method='minuit_hesse', name='hesse') # these are the default values" ] }, { @@ -1654,7 +1513,7 @@ }, "outputs": [], "source": [ - "print(result.params)" + "print(result)" ] }, { @@ -1707,7 +1566,7 @@ }, "outputs": [], "source": [ - "print(result.params)" + "print(result)" ] }, { diff --git a/introduction/Introduction_long.ipynb b/introduction/Introduction_long.ipynb index d75d030..8b6f800 100644 --- a/introduction/Introduction_long.ipynb +++ b/introduction/Introduction_long.ipynb @@ -186,146 +186,6 @@ "data_normal.n_obs" ] }, - { - "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "## Data\n", - "\n", - "This component in general plays a minor role in zfit: it is mostly to provide a unified interface for data.\n", - "\n", - "Preprocessing is therefore not part of zfit and should be done beforehand. Python offers many great possibilities to do so (e.g. Pandas).\n", - "\n", - "zfit `Data` can load data from various sources, most notably from Numpy, Pandas DataFrame, TensorFlow Tensor and ROOT (using uproot). It is also possible, for convenience, to convert it directly `to_pandas`. The constructors are named `from_numpy`, `from_root` etc." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "import zfit\n", - "# znp is a subset of numpy functions with a numpy interface but using actually the zfit backend (currently TF)\n", - "import zfit.z.numpy as znp\n", - "from zfit import z" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "A `Data` needs not only the data itself but also the observables: the human readable string identifiers of the axes (corresponding to \"columns\" of a Pandas DataFrame). It is convenient to define the `Space` not only with the observable but also with a limit: this can directly be re-used as the normalization range in the PDF.\n", - "\n", - "First, let's define our observables" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "obs = zfit.Space('obs1', (-5, 10))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "This `Space` has limits. Next to the effect of handling the observables, we can also play with the limits: multiple `Spaces` can be added to provide disconnected ranges. More importantly, `Space` offers functionality:\n", - "- limit1d: return the lower and upper limit in the 1 dimensional case (raises an error otherwise)\n", - "- rect_limits: return the n dimensional limits\n", - "- area(): calculate the area (e.g. distance between upper and lower)\n", - "- inside(): return a boolean Tensor corresponding to whether the value is _inside_ the `Space`\n", - "- filter(): filter the input values to only return the one inside" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "size_normal = 10000\n", - "data_normal_np = np.random.normal(size=size_normal, scale=2)\n", - "\n", - "data_normal = zfit.Data.from_numpy(obs=obs, array=data_normal_np)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "pycharm": { - "name": "#%% md\n" - } - }, - "source": [ - "The main functionality is\n", - "- nevents: attribute that returns the number of events in the object\n", - "- data_range: a `Space` that defines the limits of the data; if outside, the data will be cut\n", - "- n_obs: defines the number of dimensions in the dataset\n", - "- with_obs: returns a subset of the dataset with only the given obs\n", - "- weights: event based weights\n", - "\n", - "Furthermore, `value` returns a Tensor with shape `(nevents, n_obs)`.\n", - "\n", - "To retrieve values, in general `z.unstack_x(data)` should be used; this returns a single Tensor with shape (nevents) or a list of tensors if `n_obs` is larger then 1." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "print(\n", - " f\"We have {data_normal.nevents} events in our dataset with the minimum of {np.min(data_normal.unstack_x())}\") # remember! The obs cut out some of the data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "pycharm": { - "name": "#%%\n" - } - }, - "outputs": [], - "source": [ - "data_normal.n_obs" - ] - }, { "cell_type": "markdown", "metadata": { @@ -362,7 +222,7 @@ "A `Parameter` (there are different kinds actually, more on that later) takes the following arguments as input:\n", "`Parameter(human readable name, initial value[, lower limit, upper limit])` where the limits are recommended but not mandatory. Furthermore, `step_size` can be given (which is useful to be around the given uncertainty, e.g. for large yields or small values it can help a lot to set this). Also, a `floating` argument is supported, indicating whether the parameter is allowed to float in the fit or not (just omitting the limits does _not_ make a parameter constant).\n", "\n", - "Parameters have a unique name. This is served as the identifier for e.g. fit results. However, a parameter _cannot_ be retrieved by its string identifier (its name) but the object itself should be used. In places where a parameter maps to something, the object itself is needed, not its name." + "The name of the parameter identifies it; therefore, while multiple parameters with the same name can exist, they cannot exist inside the same model/loss/function, as they would be ambiguous." ] }, { @@ -376,6 +236,7 @@ "outputs": [], "source": [ "mu = zfit.Parameter('mu', 1, -3, 3, step_size=0.2)\n", + "another_mu = zfit.Parameter('mu', 2, -3, 3, step_size=0.2)\n", "sigma_num = zfit.Parameter('sigma42', 1, 0.1, 10, floating=False)" ] }, @@ -412,10 +273,7 @@ "name": "#%% md\n" } }, - "source": [ - "*PITFALL NOTEBOOKS: since the parameters have a unique name, a second parameter with the same name cannot be created; the behavior is undefined and therefore it raises an error.\n", - "While this does not pose a problem in a normal Python script, it does in a Jupyter-like notebook, since it is an often practice to \"rerun\" a cell as an attempt to \"reset\" things. Bear in mind that this does not make sense, from a logic point of view. The parameter already exists. Best practice: write a small wrapper, do not rerun the parameter creation cell or simply rerun the notebook (restart kernel & run all). For further details, have a look at the discussion and arguments [here](https://github.com/zfit/zfit/issues/186)*" - ] + "source": [] }, { "cell_type": "markdown", @@ -779,7 +637,7 @@ "\n", " nbins = 50\n", "\n", - " lower, upper = data.v1.limits\n", + " lower, upper = data.space.v1.limits\n", " x = znp.linspace(lower, upper, num=1000) # np.linspace also works\n", " y = model.pdf(x) * size_normal / nbins * data.data_range.area()\n", " y *= scale\n", @@ -863,7 +721,7 @@ }, "outputs": [], "source": [ - "mass_obs = zfit.Space('mass', (0, 1000))" + "mass_obs = zfit.Space('mass', 0, 1000)" ] }, { @@ -897,7 +755,7 @@ "source": [ "# combinatorial background\n", "\n", - "lam = zfit.Parameter('lambda', -0.01, -0.05, -0.001)\n", + "lam = zfit.Parameter('lambda', -0.01, -0.05, -0.00001)\n", "comb_bkg = zfit.pdf.Exponential(lam, obs=mass_obs)" ] }, @@ -1426,11 +1284,12 @@ "outputs": [], "source": [ "values = z.unstack_x(data)\n", - "obs_right_tail = zfit.Space('mass', (700, 1000))\n", + "obs_right_tail = zfit.Space('mass', (550, 1000))\n", "data_tail = zfit.Data.from_tensor(obs=obs_right_tail, tensor=values)\n", - "with comb_bkg.set_norm_range(obs_right_tail):\n", - " nll_tail = zfit.loss.UnbinnedNLL(comb_bkg, data_tail)\n", - " minimizer.minimize(nll_tail)" + "comb_bkg_right = comb_bkg.to_truncated(limits=obs_right_tail) # this gets the normalization right\n", + "nll_tail = zfit.loss.UnbinnedNLL(comb_bkg_right, data_tail)\n", + "result_sideband = minimizer.minimize(nll_tail)\n", + "print(result_sideband)" ] }, { @@ -1617,7 +1476,7 @@ }, "outputs": [], "source": [ - "result.hesse(method='minuit_hesse', name='hesse')" + "result.hesse(method='minuit_hesse', name='hesse') # these are the default values" ] }, { @@ -1654,7 +1513,7 @@ }, "outputs": [], "source": [ - "print(result.params)" + "print(result)" ] }, { @@ -1707,7 +1566,7 @@ }, "outputs": [], "source": [ - "print(result.params)" + "print(result)" ] }, {