A reproducability checklist

🐍 Python Best Practices

Keep it simple: Think of the bachelor’s student that might eventually be forced to read your paper. Some coding patterns are fancy and can be written in one line. They can also be no more performant than a more verbose implementation. In a research setting, it can often be more useful to lean towards readability. Avoid premature optimization [1], [2])
Use descriptive variable names: Avoid single-letter variables except for widely understood conventions (e.g., i for loop counters). num_epochs is better than n_e. participant_data is better than pd (which could be confused with pandas).
Use type hints: Help others understand your code with Python type annotations. Document expected input/output types. Enable static type checking for early error detection.
Provide progress feedback: Use progress bars (tqdm) for long-running operations. Show estimated completion times for lengthy processes. Consider researchers waiting for results when designing feedback systems.
Use version control meaningfully: Write descriptive commit messages. Tag releases that correspond to paper submissions or publications. Include a .gitignore file to avoid committing data, results, or credentials.
Follow the Zen of Python [3]: “Explicit is better than implicit”. “Flat is better than nested”. “Readability counts”.
Include examples: Provide notebooks or scripts demonstrating typical workflows. Include sample data when possible. Show expected outputs to help verify correct setup.

🔄 Reproducibility Measures

Consistent random seed management: Centralized seed setting across all libraries (Python random, NumPy, PyTorch, scikit-learn). Implementation in (e.g., utils/seed.py) with cross-library support. Singleton configuration pattern in (e.g., config.py) to ensure seeds are consistent throughout the codebase.
Deterministic computation: PyTorch configured for deterministic behavior. Even though this is a little bit slower, it probably doesn’t matter for us because we are not putting these systems out in production.

🧩 Project Organization

Clear point of entry: Often it can be quite confusing to know where to start with a code base. A clear point of entry documented in the README can serve as as “Start here” on a map.
Modular Structure: When something goes wrong, it is useful to be able to isolate the problem. Clear separation of components (e.g., preprocessing, models, dataloader). Logical package hierarchy following software engineering principles. Each module has a specific responsibility (Single Responsibility Principle [4]).
Standardized directory layout: Well-documented project structure in README. Consistent organization for code, data, tests, and documentation.

📦 Installation and Environment Management

Dependency management: Conda environment specifications in environment.yml. Development mode installation with -e flag. This way, any changes you make easily propagates through the entire code base. Graceful handling of optional dependencies using try/except blocks.
Automation scripts: Liberal use of automations to make the process of reproducing your results effortless. For example, shell script (e.g., clean.sh) to automate preprocessing pipeline execution. Use Automated environment setup and activation (e.g., setup.sh).

💻 Cross-Platform Compatibility

OS-Specific Considerations: Detailed troubleshooting sections for Windows, macOS, and Linux. Line ending management for Windows systems. WSL recommendations for better Windows compatibility.
Path Handling: Platform-independent path management. Clear instructions for different operating systems.

📋 Documentation

Comprehensive README11 Any modern language model (LLM) can do this really well. There is really no excuse for not having good documentation anymore. Simply show it your code, and ask it to update your README. This is even easier if you are using a development environment with an LLM (e.g., cursor or Github Copilot).: Clear project overview and purpose. Step-by-step installation instructions. Detailed usage examples. Troubleshooting guidance.
Code Documentation: Docstrings following standard format (parameters, returns, descriptions). Module-level documentation explaining purpose and components. Comments explaining complex logic or algorithms.
Sub-module Documentation: Dedicated README files for key components (e.g., preprocessing). Explanation of component interactions.

🔍 Testing Framework

Test Suite Setup: Pytest configuration. Test data management. Conditional test execution based on data availability.22 I use coding language models for this purpose. I find them really helpful at coming up with testcases, especially ones that you might not think about. It probably doesn’t make sense to test every possible thing (certainly a point of diminishing return). I find specifically asking for high leverage tests useful..
Example Tests: Unit test examples provided. Clear instructions for running tests.

📊 Data Management

Data Organization: Clear directory structure for data files. Visual representation of expected data layout. Instructions for data acquisition (including data usage agreement requirements)33 Please take the time to find out where the data you used came from and how others can get access to it..
Data Preprocessing Pipeline: Standardized preprocessing steps. Output format specifications.

🐞 Debugging and Error Handling

Verbose Output Options: Instructions for enabling detailed logging. Configuration for different levels of verbosity.
Graceful Error Handling: Informative error messages. Fallback mechanisms for non-critical features.

🔧 Additional Software Engineering Best Practices

Configuration Management: Centralized configuration through Config class and environmental variables. Separation of code and configuration.
Utility Functions: Reusable utility modules. Consistent error handling patterns.
Version Control Guidance: Git configuration advice. .gitignore for data and environment files.

📝 Reporting and Visualization

Plotting Utilities: Non-interactive backend configuration for environments without a graphical user interface (e.g., running on a remote cluster). This means that you should set up plotting to save to disk, rather than displaying to screen. Use standardized visualization functions. Include example notebooks for analysis workflows.

🎯 Project Mangement

knitr::include_graphics("thesis_trello_2.jpeg")

Figure 1: A project management space (pick your favorite, I use Trello for no particular reason except that it works). Keep it simple, include collaborators, use comments liberally so people can know what you’re upto.

$Figure 1: A project management space (pick your favorite, I use Trello for no particular reason except that it works). Keep it simple, include collaborators, use comments liberally so people can know what you're upto.$

Collaboration space: Use a space (e.g., Trello) to keep track of what you are up to. I have tried to work such that no-one actually needs to talk to me to simply get an update. This helps you come back to your work without having to spend hours trying to remember where you left off. It helps collaborators keep up with what’s going on without having to have wasteful meetings [5]. It makes meetings faster as you have already taken notes along the way. I use a simple Trello board with an Agile workflow. I decompose from big (Epics) to small tasks (to-do). You can find plenty of sources on online for this approach.
Github Issues: Issues make it easy for everyone to know what’s going on. I use the following issue template. It follows the principles of conjectures (guessing) and refutations (testing) [6].

knitr::include_graphics("issues.jpeg")

Figure 2: Github issues. Written this way, I find that I can always come back to understand what I changed at any given time, and why. Maybe I want to avoid trying an idea that I’ve already considered? That will be found here.

Whats happening?: A clear description of the problem, with images and examples if possible.
What could it be?: Best guesses at what is causing the problem.
How can they be tested?: Ideas on how our best guesses can be tested.
What was found?: The ultimate source of the problem, potentially from the initial best guesses but not always.
How was it fixed?: The fix that was ultimately implemented after trying out some solutions.

Knuth, D.E. (1974). Structured programming with go to statements. ACM Computing Surveys 6, 261–301. 10.1145/356635.356640.

PrematureOptimization.

Peters, T. PEP 20: The zen of python.

Martin, R.C. (2003). The single responsibility principle. Agile Software Development, Principles, Patterns, and Practices.

37signals Getting real.

Popper, K.R. (2002). Conjectures and refutations: The growth of scientific knowledge (London: Routledge).