Welcome to the projects section! Explore a selection of projects and my notes in various fields of my interest. Click on the links below to access them
This project applies linear regression from first principles to model the light-cone-like spread of correlations in a quantum spin system. Using data generated from quantum dynamics simulations, I identify the propagation front by fitting a linear model via the normal equation (closed-form solution). This forms the foundation for exploring more advanced techniques like polynomial regression, cross-validation, and regularization to capture nonlinear spreading behavior in future work.
In this project, I used polynomial regression to model how signals spread across a one-dimensional system over time by treating arrival time as a function of position. I tested polynomial models of different degrees (1 to 14) and selected the best one using a validation set to avoid overfitting. This approach captured the nonlinear, sublinear growth behavior seen in the data and extended traditional linear light-cone models by fitting nonlinear fronts similar to those found in quantum dynamics. The result is a simple, interpretable model that effectively describes the causal structure of signal propagation using principled model selection.
This project explores logistic regression applied to different datasets and features, illustrating how model performance relates to feature correlation with the target. First, logistic regression models predict heart disease risk using cholesterol and maximum heart rate (thalach). The corresponding correlation matrices show the strength of each feature’s relationship with the outcome. Next, we analyze the petal width feature for classifying iris species, along with its correlation matrix, demonstrating how higher correlation typically leads to better classification. Finally, the decision boundary plot for Iris Virginica visualizes how logistic regression separates classes in feature space. Here are a few plots from the project
This project extends the previous model to include multiple features such as cholesterol, age, blood pressure, and 10 more features to predict the likelihood of heart disease. The model uses multi-variable logistic regression and is trained using gradient descent. The effectiveness of the model is evaluated using metrics like ROC-AUC, Confusion Matrix, and Precision-Recall Curve.
This project explores housing price prediction in California using real-world data from the StatLib repository. The pipeline includes data preprocessing, exploratory data analysis, feature engineering, and multiple regression models. Linear regression, decision trees, and ensemble methods like random forests are evaluated and tuned using cross-validation to optimize performance. The goal is to accurately predict median house values and understand the impact of different features.
This project applies Support Vector Machines to identify and classify different phases of a quantum system across a phase transition. By training on simulation data (e.g., Ising configurations or quantum states), the SVM learns a clear decision boundary separating phases.
Implemented eigenvalue applications: modeling web navigation with Markov matrices and long-run state prediction. Applied PCA on a cat image dataset for dimensionality reduction, visualization, reconstruction, and explained variance analysis.
Built a complete Logistic Regression pipeline from scratch in NumPy, training on image data. Demonstrated understanding of optimization (gradient descent) and performance evaluation (accuracy, cost curves).