{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f6813295-8a74-4429-9102-777f4d7b2fe1",
   "metadata": {},
   "source": [
    "QENS Example\n",
    "------------\n",
    "\n",
    "This example is based on Quasi_Elastic Neutron Scattering (QENS) data as discussed by __[Sivia et al](https://www.sciencedirect.com/science/article/pii/092145269290036R?via=ihub)__. It is used to examine a variety of molecular motions within a sample. For example, diffusion hopping, rotation modes of molecules and electronic transitions. This example uses real data collected at the ISIS neutron and muon source.\n",
    "\n",
    "This example will demonstrate the `QLData` workflow for determining the number of Lorentzians in a sample. The first step is to import the correct packages. From `quickBayes` there are three imports; \n",
    "- The workflow `QLData`\n",
    "- The fitting function `QlDataFunction`\n",
    "- A function for getting the background term from a string `get_background_function`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "78f81a55-ff50-4afc-b12b-b35ac6b0b665",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "from quickBayes.workflow.model_selection.QlData import QLData\n",
    "from quickBayes.functions.qldata_function import QlDataFunction\n",
    "from quickBayes.utils.general import get_background_function\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b30df950-1ab9-434f-9ba0-8c0655efdd77",
   "metadata": {},
   "source": [
    "The data is contained within the test's for `quickBayes`. Analysing QENS data requires both the sample measurements and the resolution. The resolution is used to reduce the background noise to zero, making it easier to identify the functional form of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1474d49d-98f6-45d3-a4bb-4f0abca135cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "DATA_DIR = os.path.join('..', '..', '..', 'test', 'shared', 'data')\n",
    "\n",
    "sample_file = os.path.join(DATA_DIR, 'sample_data_red.npy')\n",
    "resolution_file = os.path.join(DATA_DIR, 'resolution_data_red.npy')\n",
    "\n",
    "sx, sy, se = np.load(sample_file)\n",
    "rx, ry, re = np.load(resolution_file)\n",
    "\n",
    "resolution = {'x': rx, 'y': ry}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67ec7c11-b8eb-4717-b809-9ff1f4f04635",
   "metadata": {},
   "source": [
    "It would be helpful to have a look at the sample data. It shows a peak near to zero, which tapes off after about $0.2$ (in either direction). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66583ecd-f9b5-4b17-8455-b003919d177a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "fig, ax = plt.subplots()\n",
    "ax.errorbar(sx, sy, se, fmt='kx', label='data');\n",
    "ax.set(xlabel='Energy Transfer ($meV$)', ylabel='Response', title='QENS data');\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27ae815d-045a-4f11-a7dd-8aa186a945c2",
   "metadata": {},
   "source": [
    "The next step is to set the problem parameters. The `results` and `results_errors` are empty as we are doing a fresh calculation. The start value is chosen to be $-0.4$ and the end value is $0.4$, this is to make sure that there is some data that can be used for fitting the background values. The last value is the maximum number of Lorentzians to consider, in this case three."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c19ce688-3e03-4ef8-bbde-53cabbe334c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = {}\n",
    "results_errors = {}\n",
    "start_x = -0.4\n",
    "end_x = 0.4\n",
    "max_peaks = 3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "826f6dfe-27f3-43fa-b1f9-a91a6fa73eea",
   "metadata": {},
   "source": [
    "The next step is to create a workflow object. This hides the complexity of the calculations and streamlines the code. For QENS data it is necessary to use `preprocess_data`, which interpolates the sample and resolution data onto the same x values. Hence, it returns the new `x` values and the corresponding resolution values (`ry`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2496f64-b450-49a2-997c-89f40c93ecca",
   "metadata": {},
   "outputs": [],
   "source": [
    "workflow = QLData(results, results_errors)\n",
    "new_x, ry = workflow.preprocess_data(sx, sy, se, start_x, end_x, resolution)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46445a92-10fd-410e-a33b-e1f2a48ebe10",
   "metadata": {},
   "source": [
    "QENS data can measure both the elastic and inelastic lines. We know that this data includes an elastic line, so it is set to true. This corresponds to a delta function in the fitting function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0cfc5b3-1b69-4769-8c3d-0a15e91e211e",
   "metadata": {},
   "outputs": [],
   "source": [
    "elastic = True"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aea5cef9-7198-42a9-b39b-ffec94514127",
   "metadata": {},
   "source": [
    "The background is probably linear. The full fitting function is contained within `QlDataFunction`, which can hold an arbitrary number of Lorentzians. For N Lorentzians $L$ with a background function $BG(E)$ and an elastic line, the function is\n",
    "\n",
    "$$\n",
    "y(E) = BG(E) + R(E)\\circledast (A_0*\\delta(E) + \\sum_{j=1}^N L_j(E)),\n",
    "$$\n",
    "where, $E$ is the energy, $R$ is the resolution function, and $A_0$ is the amplitude of the delta function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90c96e2e-66fd-4edd-ad99-3cc69e0b8f60",
   "metadata": {},
   "outputs": [],
   "source": [
    "BG = get_background_function('linear')\n",
    "function = QlDataFunction(BG, elastic, new_x, ry, start_x, end_x)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c687add-8df4-4b96-9c6f-a839746a39d7",
   "metadata": {},
   "source": [
    "The function has some default bounds and starting guess. These are good enough for this tutorial. They are used to set the fitting engine in the workflow, at present there two available fitting engines are `scipy` and `gofit` (global optimization, see __[here for more details](https://ralna.github.io/GOFit/_build/html/index.html)__)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f2bbcd1-c1cf-4ff7-bffd-b2955f0930d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "lower, upper = function.get_bounds()\n",
    "guess = function.get_guess()\n",
    "lower, upper = function.get_bounds()\n",
    "workflow.set_scipy_engine(guess, lower, upper)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1e449fb-63ef-4b58-90ad-32833872d6f4",
   "metadata": {},
   "source": [
    "The workflow has an `execute` method, which does the required computations. This includes updating the fitting function to have the correct number of Lorentzians. It then returns the final fitting function, in this case a function containing three Lorentzians. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a056909f-02ad-46ff-ac00-f7b0c567da6f",
   "metadata": {},
   "outputs": [],
   "source": [
    "function_out = workflow.execute(max_peaks, function, guess)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c3ff3f6-5cc8-4266-a819-9f04f9350d07",
   "metadata": {},
   "source": [
    "The most likely function is the one with the largest logarithm of the posterior (loglikelihoods). This information is contained within the results dictionary, which can easily be accessed from the workflow. The following code will print out just the values for the loglikelihoods, where `Nx` indicates that it contains `x` Lorentzians."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa14c886-ddb8-4f57-b6d3-e003d023f565",
   "metadata": {},
   "outputs": [],
   "source": [
    "results, results_errors = workflow.get_parameters_and_errors\n",
    "for key in results.keys():\n",
    "    if 'log' in key:\n",
    "        print(key, results[key])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58ca82f9-f5fb-4ad2-b94f-edd0edf08905",
   "metadata": {},
   "source": [
    "It is slightly more likely to have two Lorentzians than one. It could be helpful to visualise the fits from these calculations. The `fit_engine` contains the important information about the fits, including the results. The `get_fit_values` method takes an argument of the index of the fit (so in this case an index of `0` is one Lorentzian). It then returns:\n",
    "- The x data\n",
    "- The fitted y values\n",
    "- The errors on the fit\n",
    "- The difference between the fit and the original data (not interpolated)\n",
    "- The errors on the differences\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ae8ba33-c946-4971-aa86-4cefb3bab16d",
   "metadata": {},
   "outputs": [],
   "source": [
    "fit_engine = workflow.fit_engine\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64e41446-cd87-424f-ae0f-32fb84139ce7",
   "metadata": {},
   "source": [
    "Lets look at each of the results one at a time, the range is reduced to give a better focus on the peak shape. First is a single Lorentzian."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70a6c7c1-4b6c-4706-8126-13e6aa4ae50d",
   "metadata": {},
   "outputs": [],
   "source": [
    "x_data, y_data, e_data, df, de = fit_engine.get_fit_values(0)\n",
    "fig, ax = plt.subplots()\n",
    "ax.errorbar(sx, sy, se, fmt='k-', label='data')\n",
    "plt.plot(x_data, y_data, label='fit 1 Lorentzian')\n",
    "plt.plot(x_data, df, label='difference')\n",
    "plt.legend()\n",
    "ax.set(xlabel='Energy Transfer ($meV$)', ylabel='Response', title='QENS data');\n",
    "ax.set_xlim([-.2, .2]);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa779ef2-2084-45d7-894c-281a8d05dcb0",
   "metadata": {},
   "source": [
    "The fit appears to overestimate the top of the peak and underestimate just before the peak centre. This is further shown by the difference having a dip and then bump near to zero. This is not a very good fit, but it did have the worst loglikelihood value of `-659`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ed85309-445a-465f-8fd6-16e79d411112",
   "metadata": {},
   "outputs": [],
   "source": [
    "x_data, y_data, e_data, df, de = fit_engine.get_fit_values(1)\n",
    "fig, ax = plt.subplots()\n",
    "ax.errorbar(sx, sy, se, fmt='k-', label='data')\n",
    "plt.plot(x_data, y_data, label='fit fir 2 Lorentzians')\n",
    "plt.plot(x_data, df, label='difference')\n",
    "plt.legend()\n",
    "ax.set(xlabel='Energy Transfer ($meV$)', ylabel='Response', title='QENS data');\n",
    "ax.set_xlim([-.2, .2]);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7024628d-f6ad-405d-bbc5-5b1f2da8d998",
   "metadata": {},
   "source": [
    "For two Lorentzians there does not seem to be any significant deviation between the data and the fit. This is expected as it had the best loglikelihood value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40a0d199-6fe9-49c1-a913-9ec45abd17cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "x_data, y_data, e_data, df, de = fit_engine.get_fit_values(2)\n",
    "fig, ax = plt.subplots()\n",
    "ax.errorbar(sx, sy, se, fmt='k-', label='data')\n",
    "plt.plot(x_data, y_data, label='fit')\n",
    "plt.plot(x_data, df, label='difference')\n",
    "plt.legend()\n",
    "ax.set(xlabel='Energy Transfer ($meV$)', ylabel='Response', title='QENS data');\n",
    "ax.set_xlim([-.2, .2]);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f9fca11-4601-4e86-85da-9529400a3390",
   "metadata": {},
   "source": [
    "The three Lorentzians also produces a plot with no significant deviations. The logliklihood was only marginally worse than for two Lorentzians.\n",
    "\n",
    "The last thing that might be of interest are the fit parameters and errors. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ed8662c-2ae6-4650-9875-e1e6667adb63",
   "metadata": {},
   "outputs": [],
   "source": [
    "for key in results.keys():\n",
    "    if 'N2' in key and 'log' not in key:\n",
    "        print(key, results[key], results_errors[key])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}