Level | Paper Title | Project Idea | Project Idea Reasoning | Rating | RD | Volatility | Score | Opponents | Active | Primary Area |
---|---|---|---|---|---|---|---|---|---|---|
undergraduate | $\texttt{BirdSet}$: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics | Implement and evaluate different data augmentation strategies (beyond those mentioned in the paper, such as pitch shifting, time stretching, adding different types of environmental noise) on a subset of the BirdSet training data. Train a standard convolutional neural network (CNN) model with and without these augmentations and compare their performance on the BirdSet evaluation data. The deliverable is a report analyzing the impact of different data augmentation strategies on model performance, including metrics such as precision, recall, and F1-score. | This project builds upon the provided benchmark by investigating the impact of different data augmentation techniques. It requires basic programming skills and an understanding of machine learning concepts, appropriate for an undergraduate level. Success will be measured by quantitative results showing the impact of different augmentations. | 2180.54 | 148.06 | 0.06 | 7 |
| Yes | datasets and benchmarks |
undergraduate | Instance-dependent Early Stopping | Implement Instance-dependent Early Stopping (IES) in Python using a popular machine learning library (e.g., TensorFlow, PyTorch). Evaluate the performance of IES on at least three different datasets (e.g., MNIST, CIFAR-10, Fashion-MNIST) and compare it to standard training without early stopping and with conventional early stopping. Analyze the impact of different IES parameters (threshold δ, order of difference) on training time and model accuracy. The deliverable is a well-documented Python code implementing IES, a report summarizing the experimental results, and a presentation of the findings. | This project is suitable for an undergraduate student as it provides hands-on experience with implementing and evaluating a cutting-edge machine learning technique. It requires basic coding skills (Python) and familiarity with machine learning concepts. The project builds directly on the paper's methodology, allowing the student to explore its effectiveness across different datasets. The technical complexity is manageable, and resources (libraries, datasets) are readily available. | 2180.54 | 148.06 | 0.06 | 7 |
| Yes | unsupervised, self-supervised, semi-supervised, and supervised representation learning |
undergraduate | Shape as Line Segments: Accurate and Flexible Implicit Surface Representation | Implement the Edge-based Dual Contouring (E-DC) algorithm for surface extraction from a simplified 2D Line Segment Field. Test the implementation on a set of synthetic 2D shapes and evaluate its accuracy by comparing the extracted contours to the ground truth shapes. Deliverables include a working implementation of E-DC in 2D, a report documenting the algorithm and results, and a quantitative evaluation of the reconstruction accuracy. | This project replicates a core component of the paper (E-DC) but applies it to a simplified scenario, making it suitable for an undergraduate level. It allows students to gain practical experience with implementing geometric algorithms and evaluating their performance, without requiring advanced deep learning expertise. It builds directly on the paper's methodology but in a more constrained environment. | 2180.54 | 148.06 | 0.06 | 7 |
| Yes | learning on graphs and other geometries & topologies |
undergraduate | This paper was rejected | Implement and compare the performance of several link prediction algorithms (e.g., Common Neighbors, Adamic-Adar, Resource Allocation) on both the original and degree-corrected benchmarks. Use a publicly available graph dataset (e.g., a social network or citation network). Implement the degree-corrected sampling method as described in the paper. Evaluate the algorithms using standard metrics (AUC-ROC, precision, recall) and compare the results. Analyze the runtime performance of the algorithms with and without degree correction. The deliverable is a well-documented code repository and a report summarizing the findings, including performance comparisons and runtime analysis. | This project is appropriate for an undergraduate student as it involves implementing and testing existing algorithms, but does not require highly advanced theoretical knowledge. It offers hands-on experience with graph data and algorithm evaluation, reinforcing concepts learned in coursework. | 2180.54 | 148.06 | 0.06 | 7 |
| Yes | |
undergraduate | This paper was rejected | Implement the BEAR and PCSE algorithms in Python using a standard reinforcement learning library (e.g., OpenAI Gym, PyTorch). Compare the performance of these algorithms with a baseline exploration strategy (e.g., random exploration, epsilon-greedy) on a set of benchmark gridworld environments. Deliverables: A report detailing the implementation, experimental results (including learning curves and performance metrics), and a discussion of the observed differences between the algorithms. | This project involves implementing and comparing the core algorithms presented in the paper. This is suitable for an undergraduate student as it reinforces their understanding of the theoretical concepts through practical implementation and provides hands-on experience with reinforcement learning techniques. The required resources are standard software libraries, and the task is well-defined. | 2180.54 | 148.06 | 0.06 | 7 |
| Yes | |
undergraduate | This paper was rejected | Implementing and Analyzing Layer-wise Prediction Maps for ViT on CIFAR-10. Implement the Prediction Map generation mechanism described in the paper for a standard pre-trained Vision Transformer (e.g., ViT-Base) using the CIFAR-10 dataset. Apply the classification head to patch tokens from different layers (e.g., early, middle, late). Visualize and compare the generated Prediction Maps for several classes across different layers for selected input images. Analyze how the maps change with layer depth and discuss whether deeper layers provide more semantically meaningful localization as suggested by the paper. Deliverable: A documented Python codebase implementing Prediction Map generation and a report presenting visualized results, analysis of layer-wise differences, and comparison with the paper's findings. | This project provides hands-on experience with the core concepts of the paper at an appropriate undergraduate level. It involves implementing the basic Prediction Map generation, requiring Python programming and familiarity with deep learning frameworks (PyTorch/TensorFlow) and ViT architecture, skills typically acquired in undergraduate AI/ML courses. Comparing maps from different layers reinforces understanding of feature abstraction in deep networks. Using a standard, manageable dataset like CIFAR-10 makes it feasible within a semester. The deliverable (code + report) assesses both implementation skills and analytical understanding. It avoids the complexity of the full PredicAtt implementation or theoretical extensions. | 2180.54 | 148.06 | 0.06 | 7 |
| Yes | |
undergraduate | Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control | Implement and Evaluate Mean Feature Dictionaries for a Simplified Task. Replicate the 'mean feature dictionary' computation (Section 4.2) for a simplified version of the IOI task (e.g., fewer names, fixed sentence structure) or a different simple algorithmic task solvable by a small transformer (e.g., sequence reversal). Use a pre-trained small model (like GPT-2 Small or a smaller variant). Implement the 'sufficiency' evaluation metric (Section 4.3, Figure 3 Left) by calculating the logit difference recovery when patching activations with the computed mean feature reconstructions at a specific layer/head output identified as important for the task. Deliverable: Python code implementing mean feature dictionary computation and the sufficiency test, a report detailing the experimental setup, results (sufficiency scores), and a discussion comparing findings to the paper's results for the supervised dictionaries on the IOI task. | This project allows undergraduates to engage directly with the paper's core methodology (supervised feature dictionaries and basic evaluation) but on a smaller, more manageable scale. Implementing mean feature dictionaries and the sufficiency test provides hands-on experience with the concepts and basic coding/data analysis skills relevant to interpretability research. | 2129.55 | 151.05 | 0.06 | 7 |
| Yes | interpretability and explainable AI |
undergraduate | ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks | Implement the ConFIG algorithm in Python using a standard deep learning library (e.g., PyTorch, TensorFlow) and apply it to solve a 1D time-dependent PDE, such as the heat equation with both initial and boundary conditions. Compare the performance of ConFIG with standard optimization methods (e.g., Adam with different fixed weighting schemes for the initial and boundary condition losses). Analyze the convergence behavior and the accuracy of the solution. Implement different weighting schemes and different functions within ConFIG (such as those explored in section 3 of the original paper). Deliverable: A well-documented Python code implementing ConFIG and a report comparing its performance with baseline methods, including plots of the solution and convergence curves. | This project offers a hands-on implementation of ConFIG on a well-defined problem, building practical coding skills. It's suitable for an undergraduate student who will have basic programming knowledge, and has taken courses on numerical methods and differential equations. The use of a standard library simplifies the implementation, allowing the student to focus on the core algorithm. The deliverable is a working implementation with a clear evaluation. | 2067.99 | 148.06 | 0.06 | 6 |
| Yes | applications to physical sciences (physics, chemistry, biology, etc.) |
undergraduate | UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models | Develop a Python-based tool to analyze LLM responses on UGMathBench, automatically classifying errors into predefined categories (e.g., calculation error, logical error, misunderstanding of problem) based on the problem structure and expected solution. The tool should generate visualizations of error distributions across subjects, difficulty levels, and LLM models. | An undergraduate student with programming and math skills can contribute by developing tools to analyze LLM performance on UGMathBench in more detail. This could involve creating software to automatically classify different types of errors made by LLMs, visualize performance across different subjects and difficulty levels, or correlate LLM performance with specific problem characteristics. This combines technical skills with understanding of the benchmark. The deliverable would be a functional software tool and a report analyzing LLM performance using the tool. | 2067.99 | 148.06 | 0.06 | 6 |
| Yes | datasets and benchmarks |
undergraduate | Provably Accurate Shapley Value Estimation via Leverage Score Sampling | Implement Leverage SHAP and Kernel SHAP in Python (using libraries like NumPy and scikit-learn). Compare their performance (accuracy and computational time) on various benchmark datasets (e.g., those used in the paper, or others from the UCI Machine Learning Repository). Analyze how the performance varies with different parameters (e.g., number of samples, dataset size, feature correlation). Deliverables: a working implementation of both algorithms, a report documenting the experimental setup, results (including tables and graphs), and a discussion of the observed differences. | Undergraduates can implement and evaluate Leverage SHAP on different datasets, comparing its performance with Kernel SHAP. This provides practical experience with algorithm implementation and evaluation, reinforcing concepts learned in class. | 2067.99 | 148.06 | 0.06 | 6 |
| Yes | interpretability and explainable AI |
undergraduate | CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models | Replicate a subset of the CEB evaluation on two new open-source LLMs not included in the original paper. Select three bias types, two social groups, and two tasks from the CEB benchmark. Evaluate the chosen LLMs using the appropriate metrics and compare the results with those reported in the paper. The project deliverable is a report documenting the experimental setup, the results, a comparison with the original findings, and a discussion of potential reasons for any observed differences in bias scores. Use Python and the original CEB code. | This project is suitable for an undergraduate because it involves applying existing tools (the CEB benchmark) and methods (statistical analysis) to a well-defined problem. It builds directly on the paper's methodology and provides practical experience in evaluating LLMs. The scope is manageable for an undergraduate research project, but it still offers valuable insights into LLM fairness. This project *does* require coding (Python and basic statistical packages). | 2067.99 | 148.06 | 0.06 | 6 |
| Yes | datasets and benchmarks |
undergraduate | Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable? | Investigating the Impact of Activation Functions on Circuit Multiplicity in XOR MLPs. Task: Replicate the paper's 'where-then-what' circuit search methodology (enumerating subgraphs and testing for perfect XOR replication) on small MLPs (e.g., 2-3-3-1 architecture). Systematically train and analyze multiple MLP instances for each of several different activation functions (e.g., Sigmoid, Tanh, ReLU, GeLU). Compare the number, average size/sparsity, and characteristics of 'perfect circuits' found for each activation function type. Deliverable: A research report detailing the experimental setup, code implementation, comparative results (graphs/tables showing circuit counts vs. activation function), and discussion on how activation functions influence the landscape of possible computational pathways. | This project provides hands-on experience with the paper's methodology ('where-then-what' circuit search) and core findings (existence of multiple circuits) within a manageable scope for an undergraduate project. It requires coding (Python, ML libraries) and understanding of basic ML concepts (MLPs, activation functions). By systematically varying a specific parameter (activation function) not explored in detail in the paper, it offers an opportunity for novel empirical investigation and comparative analysis, building directly on the paper's experiments. | 2067.99 | 148.06 | 0.06 | 6 |
| Yes | interpretability and explainable AI |
undergraduate | This paper was rejected | Replicate and extend the sentence importance regression analysis on scientific abstracts. 1. Collect a dataset of scientific abstracts (e.g., from arXiv using its API) across different fields (e.g., CS, Physics, Biology). 2. For each abstract, segment it into sentences. 3. Use a pre-trained sentence embedding model (e.g., from the sentence-transformers library, testing one APE/RoPE model and one ALiBi model as per the paper). 4. Generate embeddings for each sentence and the full abstract. 5. Implement the Ordinary Least Squares (OLS) regression described in Section 4.1 of the paper to reconstruct the abstract embedding from its sentence embeddings. 6. Analyze the resulting regression coefficients (normalized) as sentence importance weights, similar to Section 4.2. 7. Visualize the relationship between sentence position (normalized) and coefficient weight, potentially faceting by abstract section (if identifiable, e.g., background, methods, results) or scientific field. Deliverables: 1. Python code for data collection, embedding generation, and regression analysis. 2. A report presenting the R-squared values (reconstruction quality) and visualizations of coefficient vs. position. 3. Discussion comparing the observed positional bias patterns between the chosen models and across different scientific fields or abstract sections. | This project allows students to apply concepts from ML/NLP coursework (embeddings, cosine similarity, regression) and practice technical skills (using libraries like sentence-transformers, pandas, scikit-learn). It replicates a key part of the paper's methodology (regression analysis) on a new, accessible dataset (e.g., scientific abstracts), providing hands-on experience with data analysis and interpretation within a defined scope suitable for an undergraduate project or thesis. | 2067.99 | 148.06 | 0.06 | 6 |
| Yes | |
undergraduate | Language Representations Can be What Recommenders Need: Findings and Potentials | Implement a simplified version of the linear mapping approach described in the paper, using a pre-trained LM (e.g., a smaller BERT model from Hugging Face) to generate embeddings for movie titles. Use a publicly available dataset (e.g., MovieLens 100K). Train a linear mapping matrix to predict user ratings. Compare the performance (e.g., using RMSE or NDCG) of this simplified model with a basic collaborative filtering model (e.g., matrix factorization) implemented using a standard library (like Surprise). The deliverable is a working Python implementation, a report comparing the performance of the two models, and an analysis of the results, including error analysis and potential improvements. | This project allows an undergraduate student to implement and evaluate a simplified version of the core algorithm in the paper, building a foundational understanding of LM-based recommendation systems. It involves coding but focuses on a manageable subset of the paper's complexity, allowing the student to gain practical experience. | 2052.00 | 148.06 | 0.06 | 6 |
| Yes | other topics in machine learning (i.e., none of the above) |
undergraduate | Approximation algorithms for combinatorial optimization with predictions | Implement prediction-augmented algorithms for the Set Cover problem and compare their performance against standard approximation algorithms on various datasets. Input: Benchmark Set Cover instances and synthetic data with varying levels of prediction accuracy. Output: Experimental results comparing the solution quality and running time of different algorithms. Technology: Python with libraries like NumPy and SciPy. Deliverables: A report detailing the implementation, experimental setup, results, and analysis, including graphs and statistical analysis. | This project involves implementing and comparing algorithms, requiring coding skills and an understanding of algorithm analysis. It also allows for experimentation with different data sets and prediction models. | 2052.00 | 148.06 | 0.06 | 6 |
| Yes | optimization |
undergraduate | This paper was rejected | Implement and evaluate the performance of two reinforcement learning algorithms (e.g., REINFORCE, Actor-Critic) using Gymnasium's functional API (FuncEnv) on several of the built-in environments (e.g., CartPole, MountainCar). The deliverables include the algorithm implementations, a comparative performance analysis report (including metrics like learning speed, final reward), and a presentation summarizing the findings. | This project uses Gymnasium's functional API (FuncEnv) to implement and compare different RL algorithms, which are not natively included in the standard library. This is appropriate for an undergraduate as it builds on learned RL concepts and requires implementing known algorithms, enhancing practical coding and analytical skills. It directly relates to the paper by using the FuncEnv feature. | 2052.00 | 148.06 | 0.06 | 6 |
| Yes | |
undergraduate | This paper was rejected | Replicate the core experiments of the paper using different model architectures (e.g., Transformers, CNNs with different depths and widths) and datasets (e.g., ImageNet, text datasets). Analyze whether the findings regarding over-parameterization and training time hold true across these variations. Deliverables include a report detailing the experimental setup, results (including visualizations), a comparison with the original paper's findings, and a code repository. | This project builds on the paper's methodology by testing it on different architectures and datasets. It's suitable for an undergraduate student as it reinforces concepts learned in class (model architectures, training procedures, data augmentation) and provides practical experience with deep learning frameworks. It's achievable with readily available resources and offers a clear, measurable outcome. | 2052.00 | 148.06 | 0.06 | 6 |
| Yes | |
undergraduate | This paper was rejected | Hyperparameter Sensitivity Analysis of XOR Representation Completeness: Replicate the multi-digit XOR experiment (Section 3.2, Figure 2). Implement the model architecture (Embedding + 2-layer MLP) and the 'completeness' metric g(a). Verify replication by achieving similar results to the paper. Then, systematically vary key hyperparameters: 1) Weight decay strength (e.g., across several orders of magnitude around the paper's value). 2) MLP hidden layer width (e.g., 50, 100, 200). For each hyperparameter setting, run multiple training trials with different random seeds. Analyze how these variations affect the predictive power of the initial 'completeness' signal (i.e., does the learned representation still consistently rank in the top % of initial signals? How does the average rank change?). Deliverable: Python codebase for the model, training, metric calculation, and experiments. A report summarizing the replication results and analyzing the impact of hyperparameters on the correlation between initial completeness and the learned representation, illustrated with plots. | This project is suitable for undergraduates with basic ML coursework and coding skills. It involves replicating a core experiment (XOR) from the paper, providing hands-on experience with the methodology. The novelty comes from the systematic investigation of hyperparameter effects (weight decay, MLP width), which extends the paper's analysis by exploring the robustness and sensitivity of the completeness hypothesis to these common factors. It requires implementing the model, the completeness metric, running experiments, and analyzing results, reinforcing understanding of both the paper and general ML experimental practices. Deliverable is code and an analysis report. | 2052.00 | 148.06 | 0.06 | 6 |
| Yes | |
undergraduate | This paper was rejected | Implementing and Evaluating FuRud-Inspired Corrections on a New Text Classification Task. 1. Select a standard multi-class text classification dataset not used in the paper (e.g., 20 Newsgroups, Reuters). 2. Obtain baseline ICL probabilities for this dataset using a pre-trained LLM (e.g., via an API or a smaller local model). 3. Assume a set of 'learned' membership functions for each class (these could be taken from the paper's examples, like Figure 3, or assigned based on hypothetical bias scenarios). 4. Implement the Python code to apply these predefined triangular membership functions (Equation 2 & 3) to transform the test set probabilities. 5. Calculate and compare Accuracy and COBias before and after applying the FuRud-inspired corrections. Analyze the impact of different function choices for specific classes. Deliverable: A code repository (e.g., Jupyter Notebook) with the implementation and a report documenting the dataset, baseline performance, applied corrections, results (tables/plots comparing before/after), and analysis. | This project is suitable for undergraduates with basic programming skills (Python) and familiarity with machine learning concepts. It involves implementing a core part of the paper's methodology (applying learned fuzzy corrections) and evaluating it on a new dataset. Limiting the complexity by using pre-defined functions and focusing on the application/evaluation phase makes it manageable. It provides hands-on experience with bias metrics (Accuracy, COBias) and the practical effect of probability transformations, directly building on the paper's FuRud mechanism. | 2045.23 | 148.33 | 0.06 | 6 |
| Yes | |
undergraduate | Improving Unsupervised Constituency Parsing via Maximizing Semantic Information | Implement a simplified version of the SemInfo-based PCFG induction model using a small, synthetic dataset of sentences and their paraphrases. Focus on implementing the bag-of-substrings representation and the PWI calculation. Compare the parsing accuracy of this model with a standard PCFG trained using maximum likelihood estimation. The deliverable is a working implementation and a report documenting the implementation details, experimental setup, results, and analysis. | This project requires implementing a simplified version of the SemInfo approach, reinforcing core concepts in NLP and allowing for hands-on experience with parsing algorithms. The use of a controlled, synthetic dataset simplifies the implementation and allows for a more focused analysis. The project is a simplified version of the main result in the paper. | 2031.51 | 148.06 | 0.06 | 6 |
| Yes | applications to computer vision, audio, language, and other modalities |