We aim to explore various surrogate models in this assignment. We will only consider a one dimensional example to keep things simple. Lets take the classic example of
\[y = (6x-2)^2 \sin{(12x-4)}, \ x \in [0,1]\]
The function looks like this
Lets try to construct various surrogate models for this function. We will consider
Linear regression with various basis functions
Non-linear regression
Piecewise cubic spline interpolation
and compare their performance.
Sampling
Even though the functional form of the true function is known, we will assume that it is unknown from this point onwards. Therefore we need to sample the function.
Task 1
Implement following sampling techniques
Random sampling (uniform) with 5, 10 and 15 samples
Equi-spaced sampling with 5, 10 and 15 samples
Latin hypercube sampling with 5, 10 and 15 samples
---------------------------------------------------------------------------ModuleNotFoundError Traceback (most recent call last)
Cell In[4], line 11 9# importing needed modules 10importnumpyasnp---> 11importpandasaspd 12importmatplotlib.pyplotasplt 13fromscipy.statsimport qmc
ModuleNotFoundError: No module named 'pandas'
Nonlinear Regression
Task 5
Implement nonlinear regression for
\(\hat{y} = \beta_1 + \beta_2 x^2*\sin(\beta_3 x + \beta_4)\)
Then try
\(\hat{y} = (\beta_1 x + \beta_2)^2 \sin(\beta_3 x + \beta_4)\)
You will notice that the second model gives better results. This is obvious. But remember that we need a deep intuition to pick the correct functional form in the general case. This is usually not possible. Also you will notice that finding \(\beta\)’s is much more difficult in the case of nonlinear regression that the earlier linear case.
---------------------------------------------------------------------------ModuleNotFoundError Traceback (most recent call last)
Cell In[5], line 11 9# importing needed modules 10importnumpyasnp---> 11importpandasaspd 12importmatplotlib.pyplotasplt 13fromscipy.statsimport qmc
ModuleNotFoundError: No module named 'pandas'
Piecewise Cubic Spline Interpolation
Task 6
This example is a bit more involved. You will need to learn about the cubic spline interpolation. Then use a suitable library to get the interpolation.
---------------------------------------------------------------------------ModuleNotFoundError Traceback (most recent call last)
Cell In[6], line 11 9# importing needed modules 10importnumpyasnp---> 11importpandasaspd 12importmatplotlib.pyplotasplt 13fromscipy.statsimport qmc
ModuleNotFoundError: No module named 'pandas'
Deliverables
You should deliver a pdf report and a zip file containing all the codes.
In each case, you should report the mean squared error of the model with respect to the true function. You should also plot the true function and the model for each case.
Your general conclusions about the advantages and disadvantages of the various sampling techniques and regression models should be included in the report.
Fit each model with all the sampling techniques and compare the results.
Submission
Submission deadline is 23:59 on 31st March. Email all the assignments to ramkumars.24@res.iist.ac.in.
Show the code
import numpy as npimport matplotlib.pyplot as pltx = np.linspace(0, 1, 100)y = (6*x-2)**2* np.sin(12*x-4)# plt.plot(x, y)# plt.xlabel('x')# plt.ylabel('y')# plt.title('True function')# plt.show()# Task 1 - Samplingdef true_function(x):return (6*x-2)**2* np.sin(12*x-4)def random_sampling(n): x = np.random.rand(n)return xdef equi_spaced_sampling(n): x = np.linspace(0, 1, n)return xdef latin_hypercube_sampling(n):# Implementing LHS using SCiPy.stats.qmcfrom scipy.stats import qmc sampler = qmc.LatinHypercube(d=1,scramble=True) sample = sampler.random(n) x = sample[:, 0]return x# Plotting the samples for n=5 with all three methods in the same plot# points = 5# x = random_sampling(points)# plt.plot(x, 'or', label='Random')# x = equi_spaced_sampling(points)# plt.plot(x, '+g', label='Equi-spaced')# x = latin_hypercube_sampling(points)# plt.plot(x, 'xb', label='Latin Hypercube')# plt.xlabel('Data Points')# plt.ylabel('x')# plt.grid()# plt.legend()# plt.title('Sampling')# plt.show()# Task 2 - Linear Regressiondef linear_regression(x, y, degree): X = np.vander(x, degree+1, increasing=True) beta = np.linalg.inv(X.T @ X) @ X.T @ yreturn betadef predict(x, beta): X = np.vander(x, len(beta), increasing=True) y = X @ betareturn y# Linear regression example with 5 samples and degree 1x = random_sampling(5)y = true_function(x)beta = linear_regression(x, y, 2)y_hat = predict(x, beta)# Mean squared errormse = np.mean((y - y_hat)**2)print('Mean squared error:', mse)xx = np.linspace(0, 1, 100)plt.plot(xx, true_function(xx), 'r', label='True')yy_hat = predict(xx, beta)plt.plot(xx, yy_hat, 'b', label='Predicted')plt.plot(x, y_hat, 'og', label='Samples')plt.xlabel('x'); plt.ylabel('y')plt.legend(); plt.grid()plt.title('Linear regression with degree 2')plt.show()
Mean squared error: 1.0231030299631143
Source Code
---title: "Assignment 6"date: last-modifiedrender: trueformat: html: code-fold: true code-summary: "Show the code" width: 1000 page-layout: full code-tools: source: true toggle: false caption: none code-overflow: scrollhighlight-style: katecode-line-numbers: truecode-copy: truecode-block-border-left: truecode-block-background: truecode-overflow: scroll---We aim to explore various surrogate models in this assignment. We will only consider a one dimensional example to keep things simple. Lets take the classic example of$$y = (6x-2)^2 \sin{(12x-4)}, \ x \in [0,1]$$The function looks like thisLets try to construct various surrogate models for this function. We will consider- Linear regression with various basis functions- Non-linear regression- Piecewise cubic spline interpolationand compare their performance.## SamplingEven though the functional form of the true function is known, we will assume that it is unknown from this point onwards. Therefore we need to sample the function.### Task 1Implement following sampling techniques1. Random sampling (uniform) with 5, 10 and 15 samples1. Equi-spaced sampling with 5, 10 and 15 samples1. Latin hypercube sampling with 5, 10 and 15 samples{{< include pythonScripts/Task_1/script_sampling.qmd >}}## Linear Regression### Task 2Implement linear regression with following basis functions1. $(1, x, x^2)$1. $(1, x, x^2, x^3)$1. $(1, x, x^2, x^3, x^4)${{< include pythonScripts/Task_2/script_polynomialRegression.qmd >}}### Task 3Implement linear regression with Chebyshev basis functions1. $(1, T_1(x), T_2(x))$1. $(1, T_1(x), T_2(x), T_3(x), T_4(x))${{< include pythonScripts/Task_3/script_chebyshevRegression.qmd >}}### Task 4Implement linear regression with sine basis functions1. $(1, \sin(\pi x), \sin(2\pi x))$1. $(1, \sin(\pi x), \sin(2\pi x), \sin(3\pi x))$1. $(1, \sin(\pi x), \sin(2\pi x), \sin(3\pi x), \sin(4\pi x))${{< include pythonScripts/Task_4/script_sineBasisLR.qmd >}}## Nonlinear Regression### Task 5Implement nonlinear regression for$\hat{y} = \beta_1 + \beta_2 x^2*\sin(\beta_3 x + \beta_4)$Then try$\hat{y} = (\beta_1 x + \beta_2)^2 \sin(\beta_3 x + \beta_4)$You will notice that the second model gives better results. This is obvious. Butremember that we need a deep intuition to pick the correct functional form inthe general case. This is usually not possible. Also you will notice thatfinding $\beta$'s is much more difficult in the case of nonlinear regressionthat the earlier linear case.{{< include pythonScripts/Task_5/script_nonlinearR.qmd >}}## Piecewise Cubic Spline Interpolation### Task 6This example is a bit more involved. You will need to learn about the cubicspline interpolation. Then use a suitable library to get the interpolation.{{< include pythonScripts/Task_6/script_spline.qmd >}}## DeliverablesYou should deliver a `pdf` report and a zip file containing all the codes.In each case, you should report the mean squared error of the model with respect to thetrue function. You should also plot the true function and the model for each case.Your general conclusions about the advantages and disadvantages of the varioussampling techniques and regression models should be included in the report.Fit each model with all the sampling techniques and compare the results.## SubmissionSubmission deadline is 23:59 on 31st March. Email all the assignments to `ramkumars.24@res.iist.ac.in`.```{python}import numpy as npimport matplotlib.pyplot as pltx = np.linspace(0, 1, 100)y = (6*x-2)**2* np.sin(12*x-4)# plt.plot(x, y)# plt.xlabel('x')# plt.ylabel('y')# plt.title('True function')# plt.show()# Task 1 - Samplingdef true_function(x):return (6*x-2)**2* np.sin(12*x-4)def random_sampling(n): x = np.random.rand(n)return xdef equi_spaced_sampling(n): x = np.linspace(0, 1, n)return xdef latin_hypercube_sampling(n):# Implementing LHS using SCiPy.stats.qmcfrom scipy.stats import qmc sampler = qmc.LatinHypercube(d=1,scramble=True) sample = sampler.random(n) x = sample[:, 0]return x# Plotting the samples for n=5 with all three methods in the same plot# points = 5# x = random_sampling(points)# plt.plot(x, 'or', label='Random')# x = equi_spaced_sampling(points)# plt.plot(x, '+g', label='Equi-spaced')# x = latin_hypercube_sampling(points)# plt.plot(x, 'xb', label='Latin Hypercube')# plt.xlabel('Data Points')# plt.ylabel('x')# plt.grid()# plt.legend()# plt.title('Sampling')# plt.show()# Task 2 - Linear Regressiondef linear_regression(x, y, degree): X = np.vander(x, degree+1, increasing=True) beta = np.linalg.inv(X.T @ X) @ X.T @ yreturn betadef predict(x, beta): X = np.vander(x, len(beta), increasing=True) y = X @ betareturn y# Linear regression example with 5 samples and degree 1x = random_sampling(5)y = true_function(x)beta = linear_regression(x, y, 2)y_hat = predict(x, beta)# Mean squared errormse = np.mean((y - y_hat)**2)print('Mean squared error:', mse)xx = np.linspace(0, 1, 100)plt.plot(xx, true_function(xx), 'r', label='True')yy_hat = predict(xx, beta)plt.plot(xx, yy_hat, 'b', label='Predicted')plt.plot(x, y_hat, 'og', label='Samples')plt.xlabel('x'); plt.ylabel('y')plt.legend(); plt.grid()plt.title('Linear regression with degree 2')plt.show()```