fix: intro long

zfit · Apr 12, 2024 · c4ce783 · c4ce783
1 parent 198ba2b
commit c4ce783
Show file tree

Hide file tree

Showing 2 changed files with 28 additions and 310 deletions.
diff --git a/_website/tutorials/introduction/Introduction_long.ipynb b/_website/tutorials/introduction/Introduction_long.ipynb
@@ -186,146 +186,6 @@
     "data_normal.n_obs"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "pycharm": {
-     "name": "#%% md\n"
-    }
-   },
-   "source": [
-    "## Data\n",
-    "\n",
-    "This component in general plays a minor role in zfit: it is mostly to provide a unified interface for data.\n",
-    "\n",
-    "Preprocessing is therefore not part of zfit and should be done beforehand. Python offers many great possibilities to do so (e.g. Pandas).\n",
-    "\n",
-    "zfit `Data` can load data from various sources, most notably from Numpy, Pandas DataFrame, TensorFlow Tensor and ROOT (using uproot). It is also possible, for convenience, to convert it directly `to_pandas`. The constructors are named `from_numpy`, `from_root` etc."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "pycharm": {
-     "name": "#%%\n"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "import zfit\n",
-    "# znp is a subset of numpy functions with a numpy interface but using actually the zfit backend (currently TF)\n",
-    "import zfit.z.numpy as znp\n",
-    "from zfit import z"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "pycharm": {
-     "name": "#%% md\n"
-    }
-   },
-   "source": [
-    "A `Data` needs not only the data itself but also the observables: the human readable string identifiers of the axes (corresponding to \"columns\" of a Pandas DataFrame). It is convenient to define the `Space` not only with the observable but also with a limit: this can directly be re-used as the normalization range in the PDF.\n",
-    "\n",
-    "First, let's define our observables"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "pycharm": {
-     "name": "#%%\n"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "obs = zfit.Space('obs1', (-5, 10))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "pycharm": {
-     "name": "#%% md\n"
-    }
-   },
-   "source": [
-    "This `Space` has limits. Next to the effect of handling the observables, we can also play with the limits: multiple `Spaces` can be added to provide disconnected ranges. More importantly, `Space` offers functionality:\n",
-    "- limit1d: return the lower and upper limit in the 1 dimensional case (raises an error otherwise)\n",
-    "- rect_limits: return the n dimensional limits\n",
-    "- area(): calculate the area (e.g. distance between upper and lower)\n",
-    "- inside(): return a boolean Tensor corresponding to whether the value is _inside_ the `Space`\n",
-    "- filter(): filter the input values to only return the one inside"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "pycharm": {
-     "name": "#%%\n"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "size_normal = 10000\n",
-    "data_normal_np = np.random.normal(size=size_normal, scale=2)\n",
-    "\n",
-    "data_normal = zfit.Data.from_numpy(obs=obs, array=data_normal_np)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "pycharm": {
-     "name": "#%% md\n"
-    }
-   },
-   "source": [
-    "The main functionality is\n",
-    "- nevents: attribute that returns the number of events in the object\n",
-    "- data_range: a `Space` that defines the limits of the data; if outside, the data will be cut\n",
-    "- n_obs: defines the number of dimensions in the dataset\n",
-    "- with_obs: returns a subset of the dataset with only the given obs\n",
-    "- weights: event based weights\n",
-    "\n",
-    "Furthermore, `value` returns a Tensor with shape `(nevents, n_obs)`.\n",
-    "\n",
-    "To retrieve values, in general `z.unstack_x(data)` should be used; this returns a single Tensor with shape (nevents) or a list of tensors if `n_obs` is larger then 1."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "pycharm": {
-     "name": "#%%\n"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\n",
-    "    f\"We have {data_normal.nevents} events in our dataset with the minimum of {np.min(data_normal.unstack_x())}\")  # remember! The obs cut out some of the data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "pycharm": {
-     "name": "#%%\n"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "data_normal.n_obs"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -362,7 +222,7 @@
     "A `Parameter` (there are different kinds actually, more on that later) takes the following arguments as input:\n",
     "`Parameter(human readable name, initial value[, lower limit, upper limit])` where the limits are recommended but not mandatory. Furthermore, `step_size` can be given (which is useful to be around the given uncertainty, e.g. for large yields or small values it can help a lot to set this). Also, a `floating` argument is supported, indicating whether the parameter is allowed to float in the fit or not (just omitting the limits does _not_ make a parameter constant).\n",
     "\n",
-    "Parameters have a unique name. This is served as the identifier for e.g. fit results. However, a parameter _cannot_ be retrieved by its string identifier (its name) but the object itself should be used. In places where a parameter maps to something, the object itself is needed, not its name."
+    "The name of the parameter identifies it; therefore, while multiple parameters with the same name can exist, they cannot exist inside the same model/loss/function, as they would be ambiguous."
    ]
   },
   {
@@ -376,6 +236,7 @@
    "outputs": [],
    "source": [
     "mu = zfit.Parameter('mu', 1, -3, 3, step_size=0.2)\n",
+    "another_mu = zfit.Parameter('mu', 2, -3, 3, step_size=0.2)\n",
     "sigma_num = zfit.Parameter('sigma42', 1, 0.1, 10, floating=False)"
    ]
   },
@@ -412,10 +273,7 @@
      "name": "#%% md\n"
     }
    },
-   "source": [
-    "*PITFALL NOTEBOOKS: since the parameters have a unique name, a second parameter with the same name cannot be created; the behavior is undefined and therefore it raises an error.\n",
-    "While this does not pose a problem in a normal Python script, it does in a Jupyter-like notebook, since it is an often practice to \"rerun\" a cell as an attempt to \"reset\" things. Bear in mind that this does not make sense, from a logic point of view. The parameter already exists. Best practice: write a small wrapper, do not rerun the parameter creation cell or simply rerun the notebook (restart kernel & run all). For further details, have a look at the discussion and arguments [here](https://github.com/zfit/zfit/issues/186)*"
-   ]
+   "source": []
   },
   {
    "cell_type": "markdown",
@@ -779,7 +637,7 @@
     "\n",
     "    nbins = 50\n",
     "\n",
-    "    lower, upper = data.v1.limits\n",
+    "    lower, upper = data.space.v1.limits\n",
     "    x = znp.linspace(lower, upper, num=1000)  # np.linspace also works\n",
     "    y = model.pdf(x) * size_normal / nbins * data.data_range.area()\n",
     "    y *= scale\n",
@@ -863,7 +721,7 @@
    },
    "outputs": [],
    "source": [
-    "mass_obs = zfit.Space('mass', (0, 1000))"
+    "mass_obs = zfit.Space('mass', 0, 1000)"
    ]
   },
   {
@@ -897,7 +755,7 @@
    "source": [
     "# combinatorial background\n",
     "\n",
-    "lam = zfit.Parameter('lambda', -0.01, -0.05, -0.001)\n",
+    "lam = zfit.Parameter('lambda', -0.01, -0.05, -0.00001)\n",
     "comb_bkg = zfit.pdf.Exponential(lam, obs=mass_obs)"
    ]
   },
@@ -1426,11 +1284,12 @@
    "outputs": [],
    "source": [
     "values = z.unstack_x(data)\n",
-    "obs_right_tail = zfit.Space('mass', (700, 1000))\n",
+    "obs_right_tail = zfit.Space('mass', (550, 1000))\n",
     "data_tail = zfit.Data.from_tensor(obs=obs_right_tail, tensor=values)\n",
-    "with comb_bkg.set_norm_range(obs_right_tail):\n",
-    "    nll_tail = zfit.loss.UnbinnedNLL(comb_bkg, data_tail)\n",
-    "    minimizer.minimize(nll_tail)"
+    "comb_bkg_right = comb_bkg.to_truncated(limits=obs_right_tail)  # this gets the normalization right\n",
+    "nll_tail = zfit.loss.UnbinnedNLL(comb_bkg_right, data_tail)\n",
+    "result_sideband = minimizer.minimize(nll_tail)\n",
+    "print(result_sideband)"
    ]
   },
   {
@@ -1617,7 +1476,7 @@
    },
    "outputs": [],
    "source": [
-    "result.hesse(method='minuit_hesse', name='hesse')"
+    "result.hesse(method='minuit_hesse', name='hesse')  # these are the default values"
    ]
   },
   {
@@ -1654,7 +1513,7 @@
    },
    "outputs": [],
    "source": [
-    "print(result.params)"
+    "print(result)"
    ]
   },
   {
@@ -1707,7 +1566,7 @@
    },
    "outputs": [],
    "source": [
-    "print(result.params)"
+    "print(result)"
    ]
   },
   {