Hacker News Re-Imagined

Ask HN: Should I publish my research code?

I'm looking for advice om whether I should publish my research code? The paper itself is enough to reproduce all the results. However, the implementation can easily take two months of work to get it right.

In my field many scientists tend to not publish the code nor the data. They would mostly write a note that code and data are available upon request.

I can see the pros of publishing the code as it's obviously better for open science and it makes the manuscript more solid and easier for anyone trying to replicate the work.

But on the other hand it's substantially more work to clean and organize the code for publishing, it will increase the surface for nitpicking and criticism (e.g. coding style, etc). Besides, many scientists look at code as a competitive advantage so in this case publishing the code will be removing the competitive advantage.

  • 420 points
  • 7 days ago

  • @jarenmf
  • Created a post

Ask HN: Should I publish my research code?


@jonas_kgomo 7 days

Replying to @jarenmf 🎙

Balaji[0] goes into a long thread on why reproducibility is important. [0] https://twitter.com/balajis/status/1337620554971410434

Reply


@guy-brush 7 days

Replying to @jarenmf 🎙

In my view and personal experience, the pros outweigh the cons:

* You increase the impact of your work and as a consequence also might get more citations.

* It's the right thing to do for open and reproducible research.

* You can get feedback and improve the method.

* You are still the expert on your own code. That someone picks it up, implements an idea that you also had and publishes before you is unlikely.

* I never got comments like "you could organize the code better" and don't think researchers would tend to do this.

* Via the code you can get connected to groups you haven't worked with yet.

* It's great for your CV. Companies love applicants with open-source code.

Reply


@the_snooze 7 days

Replying to @jarenmf 🎙

Does your publication venue have an artifact review committee? That would be a good way to share your code and (redacted or anonymized) data. I'm in security/privacy research, and our venues recently started doing this. They serve as a quality check, labeling your artifacts from merely "submitted" to "results reproduced."

https://www.usenix.org/conference/usenixsecurity22/call-for-...

https://petsymposium.org/artifacts.php

Reply


@cjekel 7 days

Replying to @jarenmf 🎙

So it really depends upon what you want out of your research career. Part of being a successful researcher is making an impact on the community. This involves producing works that the community finds useful. I've always looked at making my code available as another avenue to help increase the impact of my work. In my case, many more people have used my public codes than have ever read my papers.

You have limited time. I'd prioritize that time on what you think others will find useful.

Don't worry about ugly code. There are research codes with 1k+ stars on GitHub that are ugly. They have so many stars because people find them useful.

You absolutely don't have to publish your code, or anything else of that matter. Don't let the the drive for impact on the community force you into working on something you're not interested in.

Congrats on your publication.

Reply


@grahamlee 7 days

Replying to @jarenmf 🎙

Hi, I’m a Research Software Engineer (a person who makes software that helps academics/researchers) at a university in the UK. My recommendation is that not only do you publish the code, you mint a DOI (digital object identifier, Zenodo is usually the go to place for that) for the specific version that was used in your paper and you associate them. And you include a citation file (GitHub supports them now: https://docs.github.com/en/repositories/managing-your-reposi...) in your software repo.

Benefits: people who want to reproduce your analysis can use exactly the right software, and people who want to build on your work can find the latest in your repo. Either know how to cite your work correctly.

In practice drive-by nitpicking over coding style is not that common, particularly in (some) science fields where the other coders are all other scientists who don’t have strong views on it. Nitpicks can be easily ignored anyway.

BTW should you choose to publish, the Turing Way has a section on software licenses written for researchers: https://the-turing-way.netlify.app/reproducible-research/lic...

Reply


@fsflover 7 days

Replying to @jarenmf 🎙



@hannob 7 days

Replying to @jarenmf 🎙

I some sense the way you phrase your question shows how broken incentives in science are.

The obvious answer for science is: publish. The goal of science should be to make it easy for others to reproduce your work. Not to make it theoretically possible, but hard, because of the "competitive advantage".

The right thing to do would be to publish and next time you review another paper that does not publish code use that as a reason to reject it. The whole "code and data upon request" is obvious bullshit, there have been studies on it and often enough it ends up with "well, we don't have that code/data any more", "why do you need that? we won't help you plan to publish something we don't like" etc. pp.

Reply


@908B64B197 7 days

Replying to @jarenmf 🎙

> But on the other hand it's substantially more work to clean and organize the code for publishing

Make sure it's all safe to publish but don't spend any effort on organizing it, unless you can find some grant money for an undergrad to work on it.

If it has users they will contribute their changes to better organize it and use it.

Reply


@gnfargbl 7 days

Replying to @jarenmf 🎙

There's a lot of advice here, but very little data to support any of it. Since you're a scientist, why not take an experimental approach to answering this question? Publish your code, for one (selected) paper. Monitor (a) the download log, and (b) the emails you get related to your code.

I hypothesize that you will see some combination of three effects: (1) you will get lots of downloads (which means people are using your code, good work!), perhaps with lots of follow-up emails and perhaps not depending on what the code does; (2) you will get lots of emails from random nutjobs looking to pick holes in your work, and you will waste your time answering them; (3) you will get almost completely ignored.

Whatever the outcome, I think a lot of people would be interested in to hearing about what you learn.

Reply


@bombcar 7 days

Replying to @jarenmf 🎙

Publish the code or attach it as a listing, so that in 10, 15 years someone who finds your paper can find the code, too. When everything is "hot" and "live" it can be easy to reach out and get something, but when you're digging through papers and code that have been abandoned for decades, it's nice to find source.

Reply


@dusted 7 days

Replying to @jarenmf 🎙

I'd only clean it of stuff like passwords and such, and add a header that the code is provided as-is.

You could add a disclaimer that the code was worked on until it provided a satisfactory result, and no further, and is not intended for (any) use. You might even add that, except for outright, actual errors that affect the result of the research, comments are discouraged.

I often publish very bad code, terrible terrible spaghetti, it's not how I write code at my job, because at my job, I'm paid to produce not only working and correct code, but also code that is maintainable and understandable and follows certain practices.

However, my hobby is not writing corporate code, but writing code that get done what I want to get done, nothing more, and sometimes less. It might even have actual bugs in it that I can plainly see and don't care about because they don't affect my uses

If people can't tell the difference, I don't care, not my problem. If a future employer can't tell the difference, I won't work with them.

Reply


@spaniard89277 7 days

Replying to @jarenmf 🎙

Yes, please, the state of affairs currently is that it's impossible to get code, data, and pretty much anything besides the actual paper.

To me at least sends a signal of people hiding stuff. That's not good. It made me distrust some papers in the past. I tried to reach out with no success.

Reply


@amitport 7 days

Replying to @jarenmf 🎙

> it's substantially more work to clean and organize the code for publishing, it will increase the surface for nitpicking and criticism (e.g. coding style, etc).

I should take a couple of hours. The code works? You know how to reproduce what you did, right? It shouldn't be perfect. Shouldn't even pass code review. Should just work.

> many scientists look at code as a competitive advantage so in this case publishing the code will be removing the competitive advantage.

Well depends on the field I guess, but you also want recognition and impact. What is the point of publishing a result no one uses?

Reply


@mrek0 7 days

Replying to @jarenmf 🎙

It would be nice if everybody would publish code for their papers. But in a field where most people don't do it, releasing your code will probably not be beneficial for you due to the loss of the competitive advantage. I know for people with cs background this sounds weird but it is reality n academia.

In your position, I would only release code which is not too hard to reproduce anyway or which only provides negligible competitive advantage for you. I mainly have "normal" paper in mind (experiments or data analysis) - if the main contribution is, for example, an algorithm which you want people to use, the you should publish an implementation obviously.

Reply


@jgrahamc 7 days

Replying to @jarenmf 🎙

Myself and co-authors argued here https://www.nature.com/articles/nature10836?proof=t%2Btarget... for open computer code in science.

"Scientific communication relies on evidence that cannot be entirely included in publications, but the rise of computational science has added a new layer of inaccessibility. Although it is now accepted that data should be made available on request, the current regulations regarding the availability of software are inconsistent. We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail."

Reply


@nvr219 7 days

Replying to @jarenmf 🎙

Well the saying is "publish or perish" so I would definitely choose publish.

Reply


@eslaught 7 days

Replying to @jarenmf 🎙

While I publish in a field where making source code available is much more common, let me just make a couple of points:

* I have never had someone come back to criticize my code style. And if they do, so what? I'll block them and not think about it again. I don't need to get my feathers ruffled over this.

* Similarly, if someone's trying to replicate my results, and they fail, it's on them to contact me for help. After that it's on me to choose how much effort to put into helping them. But if they don't contact me, or if they don't put in a good faith effort to replicate the results, that's their problem. If they try to publish a failure to replicate without having done that, it's no more valid science than publishing bad science in the first place.

Overall, I think most people who stress about publishing code do so because they haven't done it before. I've personally only ever had good consequences from having done so (people picking up the code who would never have done anything with it if it weren't already open source).

Reply


@jasone 7 days

Replying to @jarenmf 🎙

Been there, done that. I published my doctoral research code [1] so that others could inspect, verify, replicate, extend, etc. YMMV, but the feedback I received from other researchers ranged from neutral to surprisingly positive (e.g. people using it in ways that pleasantly surprised me). But let me expand on my own experiences while developing that software, trying to figure out how to replicate the then-current state of the art.

At the time there were two widely used software packages for phylogenetic inference, PAUP* [2] and MrBayes [3]. The source code for MrBayes was available, and although at the time I had some pretty strong criticisms of the code structure, it was immensely valuable to my research, and I remain very grateful to its author for sharing the code. In contrast the PAUP* source was not available, and I struggled immensely to replicate some of its algorithms. As a case in point, I needed to compute the natural log of the gamma function with similar precision, but there was no documentation for how PAUP* did this. I eventually discovered that the PAUP* author had shared some of the low-level code with another project. Based on comments in that code I pulled the original references from the 60s literature and solved these problems that had plagued me for months in a matter of days. Now, from what I could see in that shared PAUP* code, I suspect that the PAUP* code is of very high quality. But the author significantly reduced his scientific impact by keeping the source to himself.

[1]: https://github.com/canonware/crux

[2]: https://paup.phylosolutions.com/

[3]: http://nbisweden.github.io/MrBayes/

Reply


@morelandjs 7 days

Replying to @jarenmf 🎙

I personally do not trust research that does not have reasonably polished publicly available code behind it.

A strong result isn’t just the final number, it’s also the process how you arrived there.

Reply


@jonnycomputer 7 days

Replying to @jarenmf 🎙

The code should be published, and knowing this, researchers will hopefully try avoid certain commonly harmful practices. One of these is re-using the same script to run slightly different models by editing some of the hard-coded parameters. I've myself found more than one mistake in someone else's reported results due to this sort of thing. But identifying it was quite a bit of trouble because the record of what was ran was erased when they moved on to the next model.

What I would not expect from people is code that would necessarily run in your environment. For example, in many cases, the paths are going to be hard-coded, for a variety of reasons. It might be ideal to write code that will just work, in a reproducible environment, but that often takes more work than people are willing to commit to, given all the other things they have to do.

Finally, cleaning up your code for presentation is a final opportunity for you to discover any mistakes before you publish and then later have an embarrassing public retraction.

Reply


@quag 7 days

Replying to @jarenmf 🎙

After finding a mistake in a paper, having to fix it, and then publishing my code, I’ve found other people contact me for the fix rather than the author of the paper. I would recommend publishing the code rather than assuming your paper is bug free and complete.

Similarly, I’ve found papers that don’t include their complete data set in the paper, and had to try to reverse engineer it from images and so on. It is really frustrating when papers are incomplete.

Reply


@titzer 7 days

Replying to @jarenmf 🎙

Be prepared for a metric crapton of crushing silence when you release your code. But do release your code.

Reply


@neilv 7 days

Replying to @jarenmf 🎙

Can you ask scientists who are very experienced in your field and successful in the career track that you want to be?

Separate from that, is there fairly new chatter in your field about reproducible science, publishing code and data, etc.? If so, what's the current thinking there about how valuable this is to collective science, and how that should affect the sometimes unfortunate conflicts of interest between career and science?

Reply


@rep_movsd 7 days

Replying to @jarenmf 🎙

Most code in the world is crap code, so don't worry, putting it out there lets people make it better

Reply


@sevagh 7 days

Replying to @jarenmf 🎙

How do you even know it works if you haven't been able to create a working implementation as the author?

Reply


@born2discover 7 days

Replying to @jarenmf 🎙

This one depends on both the field you are in as well as your own academic philosophy OP.

If the paper is enough to reproduce the results AND cleaning up the code can/is tedious, then adding the "code and data are available upon request" note seems both fair and justified.

That way, whoever wants the code can still ask for it and it does not lay an unnecessary burden on the author.

Reply


@sterlinm 2 days

Replying to @jarenmf 🎙

The Carpentries[0] provide some great resources and training for academics who are interested in this. Check out Software Carpentry[1] and Data Carpentry[2] in particular.

Publishing research code is admirable, and in an ideal world everybody would publish their code and data. That said, we shouldn't pretend that there aren't tradeoffs. Time spent polishing your code to make it presentable is time not spent on other aspects of your research. Time spent developing software development skills is time that could be spent learning new research techniques (or whatever). Reproducible research is great, but it's certainly possible to take it too far at the expense of your productivity/career.

You should also take your own personality into account. If you're a perfectionist you might struggle to let yourself publish research-quality code rather than production-quality code and consequently over-allocate the time you spend prettying up your code.

[0] https://carpentries.org/

[1] https://software-carpentry.org/

[2] https://datacarpentry.org/

Reply


@cowpig 7 days

Replying to @jarenmf 🎙

What is the purpose of doing research?

If the purpose is to push human knowledge forward, then it seems backwards not to publish everything.

Personally, I've found it difficult in my various careers to date when I've been put in positions where the actions that serve my immediate interests are in any way in conflict with my underlying principles or overarching goals. It's demotivating and deflating.

If I were in your position, I would publish everything and let myself feel pride in what I did. Even if we're all just insignificant specks in the grand scheme of things, pursuing a greater purpose can help make it feel like something matters.

Reply


@ta988 7 days

Replying to @jarenmf 🎙

Yes do it! I did it and you'll not receive criticism, that's just anxiety talking. Be clear at what the code is, why and that you may not maintain it as it is just a proof implementation. Most good humans understand that all researchers are not the-one-and-perfect coder. The bad ones are too busy arguing with others to even notice.

Reply


@cblconfederate 7 days

Replying to @jarenmf 🎙

Depends on the field. Let's assume you are not in math /CS/Physics

What can go "wrong"

- Someone may find a minor rounding error and now you have to issue a correction to the paper which, laudable as it is, is a bad thing

- You 'll end up having to maintain an open-source-something and possibly forks

- Your open source code may end up as a github repo in which you are just one of the contributors, not the owner and others are leeching credit from u

- People who want to criticise you will find excuses in the coding style.

Research code is messy -- it must be messy imho, or else it's probably insignificant. People who don't publish it are definitely shielded by the obscurity , while i have received scrutiny for entirely inconsequential details. You can choose to publish it in a less accessible way , which will thwart people with bad intentions. Even publishing it as a tarball in a web server is enough work to keep them away.

Reply


@alphagrep12345 7 days

Replying to @jarenmf 🎙

Publishing it gives an additional advantage that people are more likely to use and cite your work over a peer's who has a similar paper but no code.

Reply


@cinntaile 7 days

Replying to @jarenmf 🎙

If you publish your paper with code, you'll get more citations I would assume. When I look at research papers, one of the first things I look at is code and/or data availability. It would be even better if it's easy to run though and that's definitely not always the case.

Reply


@pabs3 7 days

Replying to @jarenmf 🎙

Yes definitely, and turn it into an open source project. You'll get more citations that way too. The Debian science team can probably help with some of the process.

https://wiki.debian.org/DebianScience

Reply


@tpoacher 7 days

Replying to @jarenmf 🎙

> The paper itself is enough to reproduce all the results.

Unlikely. Following the algorithm from scratch may produce "similar" results, but not "reproduce", bugs and all. The only thing that can do that is your code.

Plus, typically, when you set out to reproduce a paper from only the algorithmic description, it's typically not until you're 2 or 3 weeks into coding that you realise the original paper made many assumptions in the code that were not explicitly stated in the paper.

> However, the implementation can easily take two months of work to get it right.

An even more important reason why you should release your code.

> In my field many scientists tend to not publish the code nor the data.

A regrettable state of affairs indeed.

> They would mostly write a note that code and data are available upon request.

I have personally come across many cases where this promise could no longer be honoured by the time of the request. Publish the code.

> I can see the pros of publishing the code as it's obviously better for open science and it makes the manuscript more solid and easier for anyone trying to replicate the work.

It is also increasingly a requirement for funding bodies

But on the other hand it's substantially more work to clean and organize the code for publishing

> Then don't. Release it under the CRAPL, stating as much. It is still better than nothing.

> it will increase the surface for nitpicking and criticism (e.g. coding style, etc).

If you were an entrepreneur hoping to peddle snake oil and not get found out, then I would see your point. But you're a scientist, you're supposed to welcome such criticism and opportunities for improvement. If anything, you might even get collaborations / more publications on the basis of improving on that code.

> Besides, many scientists look at code as a competitive advantage so in this case publishing the code will be removing the competitive advantage.

I would sincerely not feel very comfortable calling such people "scientists".

Reply


@rubyist5eva 7 days

Replying to @jarenmf 🎙

There is a reason most free-software licenses have the following clauses:

"THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE."

So just pick one that is compatible with the 3rd party code you used to write your software (mostly pertaining to copyleft licenses like the GPL) - MIT and BSD licenses are generally "fine" - and just publish it. Just because your code is not "clean" or whatever doesn't preclude it from being free.

Reply


@dqpb 7 days

Replying to @jarenmf 🎙

> it's substantially more work to clean and organize the code for publishing

Regardless of whether or not you release the code, you should do this.

It’s so common for people to think that cleaning/refactoring/documenting code is a waste of time, but it’s exactly the opposite.

The point at which the code is working, but not yet polished is exactly the prime “teachable moment” for improving your skills as a programmer and for refining your knowledge of the domain the program solves for. (This is true no matter how skilled or knowledgeable you already are).

Your brain is perfectly primed to do this now, so don’t let that go to waste.

Reply


@Communitivity 7 days

Replying to @jarenmf 🎙

At the end of the day the impact and perceived quality of your research correlates to how peer reviewed it is, and how reproducible it is. Everything necessary to reproduce your research should be published, including the code. However, if you publish cleaned up versions of your code, that isn't the code you used to do your research.

I suggest publishing the code as is on something such as Github, Gitlab, etc. I suspect you have ideas on how you can improve the code, perhaps there's even a way of improving your research methodology by doing so, enabling new insights with further research. If you did a follow up experiment with improved analysis enabled by your improved code, then that's another paper, and another (more cleaned up) version of the code to push to the repository.

The above is all supposition though, as I don't know your field. If deep learning then the above seems more likely. If your field is geology, then improvements in the software might not enable better insights.

Reply


@phkahler 7 days

Replying to @jarenmf 🎙

>> Besides, many scientists look at code as a competitive advantage so in this case publishing the code will be removing the competitive advantage.

That "competitive advantage" is just holding everyone back, slowing progress. This is particularly annoying to hear coming from "research" which I thought was supposed to be advancing the state of the art for the benefit of society. That's ostensibly the reason for publishing papers right, to disseminate knowledge? Or is it really just to increase ones ego and get paid?

Not saying you should publish code, just that deliberately keeping secrets in your field seems to go against what I thought you were doing.

Reply


@rob_c 7 days

Replying to @jarenmf 🎙

TL;DR YES!!!

Frankly any paper which can't offer the basics of reproducibility is adding to the current problems across many fields.

Real-World data may restricted by copyright so be careful if this applies. If it does consider publishing with some MC data demonstrating how things are supposed to work. (You did verify your code behaviour didn't you?)

Don't clean and organise code for publishing. It is a tool, it is not 100% perfect, but it is supposed to work. Unfortunately after years in the field sometimes the correct response to nit-picking is "I don't care".

This is the trap between "writing code that was intended to give an answer" and "code that was intended to be re-used by others". Scientists often write code that fits into the former and this code should be published (in-case of mistakes and in the interests of reproducibility). But this code should never be taken to be of the quality that it should be built on by others unless this was the express intent. People who mistake it for that haven't understood the point of the work the author is engaging in.

With regard to what license, I tend to use DWTFYWWI, or just GPL, but frankly you can pick some wonderfully closed thing if you think your code might revolutionise something which in principle stops commercial entities ripping it off directly.

Reply


@hosteur 7 days

Replying to @jarenmf 🎙

Publish the code as is and move on.

Reply


@zahllos 7 days

Replying to @jarenmf 🎙

I think you probably already have the answers you were seeking (yes, ideally you should) but I'd like to add some points:

Ideally, it would be nice if the code has a professional-level quality to it, but I think everyone involved in evaluating research understands that it is at best a prototype. Proper software engineering is expensive, and it is not the role of research to do this. The process, as it was explained to me: university research pushes the state of the art, industrial research labs are slightly behind this and looking to transfer into practical uses (along with this some government agencies are interested in tech transfer) and finally software engineering takes these ideas and turns them into actual products. You aren't making a product, so it is OK for the code not to be perfect (also, from experience, 'professional' industry code is not always that great either). The main point is that someone has some chance of reproducing your results.

The exception to this is if you are making a product, where the definition of product is a tool for further research. Examples might be tools for symbolic execution or formal verification, in which case it might be worth some time to make the experience of using it good for that benefit, to reduce friction so that people try and want to use your tool.

Artefact evaluation is rapidly becoming something people are encouraged to do and helps enormously in verifying results, but the point is usually to try to reproduce the results of the paper to back up the science, not to start an argument over coding style. I would hope that artefact evaluation processes make this clear and ensures that evaluations of artefacts focus on reproducibility. For outside comments that might arise, I suggest you publish the work as open source and respond to any criticisms with a fairly standard line: yes this is research quality code and we would like to have time to improve it. If you would like to submit a patch/pull request we would welcome any help.

Reply


@orforforof 7 days

Replying to @jarenmf 🎙

I would agree with many others here who say publish it. In some fields there is an additional question of where to host it, lest your paper's impact outlast the lifetime of your current GitHub repo or whatever. There are good solutions out there. Assuming you are at a university it's worth having a chat with a librarian.

Reply


@rcthompson 7 days

Replying to @jarenmf 🎙

FWIW, this is how I've released the crappy barely-working "academic quality" code for a paper in the past:

https://github.com/DarwinAwardWinner/cd4-histone-paper-code

The main points are that I made only a minimal attempt to organize it, and I made the state of the code clear in the README. I don't recall anyone complaining about the code or even mentioning it during review. (Though to be fair, I also don't recall whether I published the code before or after the paper was accepted.)

Looking at things from the other side, I'm am at least an order of magnitude more likely to read, use the work/methods from, and therefore cite a paper that comes with code.

Reply


@TOMDM 7 days

Replying to @jarenmf 🎙

My advice is to leave industry standard formatting and style arguments to engineers.

If people want great code that runs easily and is easy to read, that's engineering work, built off the back of novel implementations.

If people want novel implementations that are likely rough around the corners and require a bit of finagling to run, leave that to the scientists.

Reply


@lallysingh 7 days

Replying to @jarenmf 🎙

Honestly, put your code out, and version control it. Benefits:

- People who use your work will cite you.

- You may get collaborators.

- It's an easy-to-get-to backup

- For non-academic jobs, it's part of your resume

Reply


@JoaoCostaIFG 7 days

Replying to @jarenmf 🎙

I'm currently a student getting a master's in computer science. In my experience, having the code available in research papers is rare, but useful. Many times I find myself struggling to understand how something can be implemented or, when presented with choices, choosing one when reading research papers. When the paper has the code published I am able to follow it better.

Some papers link to the code instead of including it. Maybe I'm just unlucky, but this usually leads to dead links (but that's a different topic altogether).

Reply


@lofatdairy 7 days

Replying to @jarenmf 🎙

Personally, as someone who's had to dig through academic code in the field of bioinformatics, I do appreciate code being attached to the paper, regardless of the paper's level of detail or the code's quality (or lack thereof). I don't think many researchers expect high quality code unless you're releasing a library explicitly for general use and expect contributors. That said, a brief README with at least instructions to execute is a rare but welcome addition in my personal opinion.

Reply


@cosmojg 7 days

Replying to @jarenmf 🎙

Yes, absolutely, and don't worry about how the code looks. So long as someone else can download your code and run it without issue, you're good to go. I've worked on multiple computational neuroscience papers and pushed to have the code published alongside each paper in every case. Not once has it come back to bite us, and if anything, it seems to get us significantly more citations.

Do it. There's no good reason not to.

Reply


@jll29 7 days

Replying to @jarenmf 🎙

Publishing the code is great, as some of the questions a reader of your paper may have can only answered by looking into the source code, as no paper has enough space to talk about all implementation details in a real-life complex system.

There is value in scrutinizing the code - not w.r.t. coding styles or standards but to discover bugs in the implementation, which are very common. Scientists are only human, and scientific software is less often checked by a second pair of eyes. There is also value in trying to replicate a study from scratch with a fresh implementation only from the details in the paper. Many conferences, for instance the European Conference on Information Retrieval (ECIR), Europe's largest scientific search technology conference, has a replication track only for replication papers, and these are often the most interesting/insightful papers. It occasionally happens that a result is not caused by what the authors think, but is merely an artifact of the implementation code. A very famous MIT researcher (not naming him or her here on purpose) fell into this trap in their Ph.D. thesis, but it can happen to anyone, really. Scientific results become objective knowledge as others solidify the body of knowledge by carrying out replications and arriving at the same results.

Whatever your decision about past code, going forward, if you plan to release all future research code, you will likely write better code in the first place, as you will constantly be aware that people will be looking at it, and that can only be a good thing.

Reply


@mhh__ 7 days

Replying to @jarenmf 🎙

Publish the code.

If someone has comments about style ask them to improve it for you.

Worry about maintaining things after someone asks for maintenance, the vast majority of code is never read again.

Reply


@agilob 7 days

Replying to @jarenmf 🎙

Like my CV and AI professor said: there were 100s of papers claiming to achieve faster or more accurate results than Viola Jones, but they never published data sets or code, so no lone believed them and all were forgotten.

Reply


@kgc 7 days

Replying to @jarenmf 🎙

You'll probably get more references if you have code, which will probably help your research career.

Reply


@grst 7 days

Replying to @jarenmf 🎙

Releasing the code is the very least you should do to make your analysis reproducible. I would be surprised if it was possible to exactly reproduce the results from the paper alone.

From Heil et al. (https://www.nature.com/articles/s41592-021-01256-7):

> Documenting implementation details without making data, models and code publicly available and usable by other scientists does little to help future scientists attempting the same analyses and less to uncover biases. Authors can only report on biases they already know about, and without the data, models and code, other scientists will be unable to discover issues post hoc.

Even better would be to containerize all software dependencies and orchestrate the analysis with a workflow manager. The authors of the above paper refer to that as "gold standard reproducibility"

Reply


@bigodanktime 7 days

Replying to @jarenmf 🎙

Many conferences are starting to adopt a badge system and will evaluate your artifact. And this is becoming more and more popular, and I know many researchers that will keep these badges in mind when reading the evaluation in the paper. For example here is the artifact evaluation that was done at SOSP 2021 https://sysartifacts.github.io/sosp2021/results.html.

Reply


@anon_123g987 7 days

Replying to @jarenmf 🎙

One argument against publishing code is that maybe there's an error (or more) in your code, which validates your possibly mistaken theory, and forcing an independent reimplementation by others would uncover this problem.

Reply


@mbreese 7 days

Replying to @jarenmf 🎙

Unless you are publishing a software methods paper, you don’t have to worry about cleaning the code or making it portable. In my field, publishing code (and data) is a requirement and has been for years. That doesn’t mean that the code needs to be pretty ( it usually isn’t), it just needs to support the paper.

So, yes. Please publish the code, it will make the rest of the paper stronger.

Reply


@acaloiar 7 days

Replying to @jarenmf 🎙

In the past I've chosen to publish key algorithms. Publishing your entire code base can become a substantial demand for your support. As an open source project supported by one person, that can be very demanding.

So identify what's most critical or novel about your work and publish that.

Reply


@cycomanic 7 days

Replying to @jarenmf 🎙

As a fellow scientist I would say go for it. I know people who had a vast amount of citations (>2000) for a paper accompanying a code/program release that they made at an opportune moment (they released a code for designing/analysing photonic crystals just when the field was taking off).

Now in the vast majority of cases you will only get a couple of people looking at your code (my experience so far), but still I think it's worth it. The question is, clean up the code or not. Ideally you would, because it increases the chance of someone using it by a lot. On the other hand with the realities of academic work, this is largely underappreciated.

So I recommend to find a balance, clean up enough so it is reasonably straight forward to run the code. Write a good readme that points to the paper and gives the appropriate citation.

Reply


@egberts1 7 days

Replying to @jarenmf 🎙

Here are my checklist to publishing research codes:

1. encumbered by pending or active patent(s)?

2. release of proprietary holds by corporates or participants

3. any tangible market values worth pursuing, then keep it to yourself.

4. any conflict with trademarks, copyrights, or domain hold? Rename it

that’s just some of the points. Contact your local VCs if it has any traction.

Reply


@chriskanan 7 days

Replying to @jarenmf 🎙

Having a polished public implementation can lead to a massive increase in the number of citations a paper recieves, if it is really a useful system. Some of my papers I think would have received far fewer citations if I had not released the code. Of course, if it is a really niche area with only a handful of researchers, this may not be true.

Reply


@ur-whale 7 days

Replying to @jarenmf 🎙

> The paper itself is enough to reproduce all the results.

I've heard this claim so many times, from many an author who had their brain so deep in the problem they were working on that they were 100% incapable of properly gauging the validity of this claim.

To verify that what you claim is true, wait two years to give your brain time to flush the context, pick your research paper up (and nothing else that wasn't made available to others) and try to reproduce the results on a brand new computer without any of the environment your developed your research with.

See how much blood you end up sweating.

PLEASE publish your research code. Don't worry about it being disgusting and hackish, it's research code, so by definition, no one expects it to be industrial strength.

Don't spend time cleaning it up either, your time is better spent on doing more research.

If you feel responsibility towards the community:

     - put a huge disclaimer at the start of the README explaining what a mess the whole thing is *because* it's research code.

     - if your really must: list requirements and provide a build.sh

Reply


@phw 7 days

Replying to @jarenmf 🎙

I've written a few paragraphs on the topic. Maybe it's helpful: https://nymity.ch/book/#make-your-code-public

Reply


@evolarjun 7 days

Replying to @jarenmf 🎙

In my field (bio) it's fairly common not to publish code, but it's becoming more common. Biologist's code is generally crappy and I think everyone understands that. The better developers are often valued for producing tools that are reliable and people can use and get lots of attention and citations for their tool papers.

The mathematicians and computer scientists I've worked with generally wrote more complicated code, but from a bugginess and maintainability standpoint I'm not sure it was any better. I had a mentor with an applied math degree who was extremely fond of one and two character variable names.

Just publish it. Unless your paper is a _BIG_DEAL_ barely anybody is going to look at it, and some people (hopefully the right people) will respect you for showing your work. I think I'm one of the few reviewers that actually try to run and maybe glance at the code for papers I review. In the papers I've reviewed I've never seen a comment that indicated any of the other reviewers even looked at it.

Reply


@Derbasti 7 days

Replying to @jarenmf 🎙

Source Code or it Didn't Happen.

Science that is not reproducible is not science.

If you can, publish something high-level. Matlab or Python or Julia is fine. C or Java, not so much, because the build environment will not be available any longer after a few years. Actually, if you can, publish several translations.

And don't forget to publish your data sets as well. And your data augmentation or whatever. Everything you need to reproduce your results.

And for the love of Knuth, DO NOT OPTIMIZE YOUR CODE. Dumb code is good code in science. You would not believe what kinds of havoc some algorithms wreaked on my systems in the name of optimization. Optimizations that made a ten-year-old algorithm run in two nanoseconds instead of four (vastly exaggerated). Optimizations that obfuscated otherwise perfectly reasonable algorithms.

The goal is reproducibility.

Reply


@ironrabbit 7 days

Replying to @jarenmf 🎙

As someone who has had to reproduce others' research results: it is much much better to release the unclean, unorganized code that actually produced your results than it is to release nothing. Even if it doesn't run (e.g. it depends on a hardware system that the user won't have access to) it's still better for people to be able to read your code and understand some tricky part that isn't fully explained in your paper.

Reply


@efangs 7 days

Replying to @jarenmf 🎙



@Uptrenda 7 days

Replying to @jarenmf 🎙

Let me say that if you do decide to release it it's not just scientists and academics who can stand to benefit. Chances are your paper is less approachable to those outside academia and your code would be easier to understand for an engineer. I would honestly encourage all researchers to publish their code on that basis. You don't have to clean it up or write any scripts to help build it. Just attach what you have and I second the idea to use the CRAPL license!

Reply


@low_tech_love 7 days

Replying to @jarenmf 🎙

Besides all the pros already mentioned, there is a high chance other researchers will use your code and cite you. If they have to compare their results with some previous state of the art, it will be the one with available code. The whole thing about “the paper is enough to reproduce” never happens, ever.

Reply


@reillyse 7 days

Replying to @jarenmf 🎙

To be frank, nobody cares about your code. I’d be shocked and flattered if anybody read any of the code I wrote during my PhD. Publish the code in its current state and move on. If people take the time to actually read and nit pick your code you’ll have succeeded.

Reply


@Fiahil 7 days

Replying to @jarenmf 🎙

Release the code as-is. It's alright if it's not clean and organized, research code is usually crappy code (no offense given).

Worst case scenario, it will end up in a star-less github repo that nobody reads.

Reply


@npteljes 7 days

Replying to @jarenmf 🎙

OP, you shouldn't worry about the state of your code. The could be criticism, but I don't think there's anything that's public and not criticized. A horrible thing that's open source is much better than something that's not. The only real thing to consider here is the type of the license, and weighing the competitive advantage you're talking about. With the license, sites like this[0] can help.

[0] https://choosealicense.com/

Reply


@register 7 days

Replying to @jarenmf 🎙

Yes you should. Just publishing as it is would be enough. Everybody understands that academic code is pretty experimental and nobody would judge it if it is pretty or not. The reason why you should publish it is to gain trust. Back when I was doing my PhD I found several instances of papers that had results that were nearly impossible to reproduce to the point that I sometimes believed they were just fakes. I am pretty sure in most of the cases that is not the case but....

Reply


@mnw21cam 7 days

Replying to @jarenmf 🎙

Many journals now require relevant code to be published. Those journals that don't are likely to be lower impact journals, but also are probably moving towards requiring the relevant code to be published. The reviewers are likely to complain about the code not being available, so you can defeat one review hurdle by publishing it. It's generally better for science if you publish it.

Reply


@freemint 7 days

Replying to @jarenmf 🎙

If Public money paid for you developing it, make the source Public und a liberal Open Source License.

Reply


@counters 7 days

Replying to @jarenmf 🎙

Unambiguously, yes. If possible, release it using some sort of open source license, and grab a DOI for the initial and any subsequent release of the code - you can use Zeonodo or some other tool for this.

I left the academic world a few years ago, but several of the analysis codes/models I published (either as stand-alone tools or artifacts published alongside a journal article) still regularly get used... if anything, there's probably a larger user base for one of my models today than there ever has been, and it's leading to a long-tail of publications where my initial work is either cited or I'm offered co-authorship when I have time to offer hands-on support for improving the model/code and offering my insight as a domain expert.

If you can take the time to clean up some code or author a lightweight package, that's amazing! But it's a bang-for-your-buck type thing. If you ever aspire to leave academia, it's undoubtedly worth spending some time to clean up the code, add documentation, add some unit tests, etc - great artifacts to use in supporting a hiring process if you move into a technical role somewhere in industry. But is far from necessary.

Reply


@enriquto 7 days

Replying to @jarenmf 🎙

> The paper itself is enough to reproduce all the results.

No, it isn't.

Reproducing the results means that you provide the code that you used so that people can reproduce it just by running "make" (or something similar). If you do not publish the code and the input data, your research is not reproducible and it should not be accepted in a modern, decent world.

It doesn't matter that your code is ugly. Nobody is going to look at it anyway. They are only going to call it. If the code is able to produce the results of the paper with the same input data, that's enough. If the code is not able to at least do that, this means that even you are not able to reproduce your own results. In that case, you shouldn't publish the paper yet.

Reply


@winterismute 7 days

Replying to @jarenmf 🎙

First, you should be proud of yourself for striving to do "the right thing".

In the field I follow the most (Computer Graphics/Rendering) I think there is a big problem with reproducibility as well, and to be honest, I think some of the major players actually have little interest in making this significantly better, since they can take advantage of the visibility of a flashy render/fps counter shown at an event while still keep on building a "moat" between them and others that want to adopt the same methods.

Which is in the end partly an answer to your question: your paper could clearly describe all the elements needed to implement a method correctly, but by providing a sample implementation you allow others to "stand" on your shoulders, as they say, instead of having to climb there first and then proceed. You can not worry too much about the state of your codebase by making clear via README/documentation/license that it's still in "proof of concept" phase.

One reasonable observation I have heard is that in some fields, during peer review, some reviewers seem to like to nitpick on the code rather than the paper, sometimes in subtle ways. Because of that, I think it can be (unfortunately) OK to release the code after acceptance or publication. But apart from this, I see only advantages.

Reply


@rasmusei 7 days

Replying to @jarenmf 🎙

I work as a researcher and I try to publish full source code for all my publications. On the point of increasing surface for nitpicking, I agree in principle that's a risk, but in practice I have not experienced any such problems in my field. I am in a field of applied natural science where most researchers write terrible code, if any, and so I suppose there are not much expectations or even concepts of coding style.

There is a nice Perspective piece in Science from 2011 [1] touching on the question of cleaning up the code. It suggests basically the same thing as several of the comments in this thread: if you don't have time or motivation to clean up the code, don't.

"even incremental steps would be a vast improvement over the current situation. To this end, I propose the following steps (in order of increasing impact and cost) that individuals and the scientific community can take. First, anyone doing any computing in their research should publish their code. It does not have to be clean or beautiful (13), it just needs to be available. Even without the corresponding data, code can be very informative and can be used to check for problems as well as quickly translate ideas. ... The next step would be to publish a cleaned-up version of the code along with the data sets in a durable non-proprietary format."

[1] Peng (2011) Science 334 1126-1127 https://doi.org/10.1126/science.1213847

Reply


@p4bl0 7 days

Replying to @jarenmf 🎙

Yes you should! And not only for ethical reasons (actually reproducible research, publicly financed work, etc), even if those are good enough by themselves.

I've always published my research code. Thanks to that, one of the tools I wrote during my PhD has been re-used by other researchers and we ended up writing a paper together! In my field is was quite a nice achievement to have a published paper without my advisor as a co-author even before my PhD defense (and it most likely counted a lot for me to get a tenured position shortly after).

The tool in question was finja, an automatic verifier/prover in OCaml for counter-measures against fault-injection attacks on asymmetric cryptosystems: https://pablo.rauzy.name/sensi/finja.html

My two most recently published papers also come with published code released as Python package:

- SeseLab, which is a software platform for teaching physical attacks (the paper and the accompanying lab sheets are in French, sorry): https://pypi.org/project/seselab/

- THC (trustable homomorphic computation), which is a generic implementation of the modular extension scheme, a simple arithmetical idea allowing to verify the integrity of a delegated computation, including over homomorphically encrypted data: https://pypi.org/project/thc/

Reply


@helloSam43 4 days

Replying to @jarenmf 🎙

I enjoyed working w/ R and R studio to have the code and published pdf always under version control and found that the reproducible research community added readers and feedback and interest from outside my narrowly-defined subject area. I didn't enjoy that frequent package updates would break everything. What kind of coding eco-system does your work exist in?

Reply


@omarhaneef 7 days

Replying to @jarenmf 🎙

There are a lot of difficult questions posed here on HN but this is not one of them: unequivocally, you should publish the code.

It is better for science, it will be better for you and it will be better for people who want to play with your code.

Publishing is a form of advertising what you did, and helping others reproduce it makes it go viral and is a testament to how much they care. It can only help your career.

You’ll definitely get people who nitpick the code. This won’t hurt and it may even help in its own way.

Reply


@derefr 7 days

Replying to @jarenmf 🎙

As an amateur who reads journal papers (maybe not the audience you're most concerned about), the two most important things to helping my understanding of the paper's results are, in order:

1. a program that I can run against the data in the paper (where I can modify the data to see how that changes the results the program generates); and

2. the source code to that program, that I can read to understand what it does.

For #1, I'd encourage you to publish something like a Docker image of your built binary, to a permanent public Docker image host; to use that Docker image version of your program to do the actual experiment/data processing for your paper; and then to cite, in your paper, the specific fully-qualified Docker image ID (e.g. hub.docker.com/foo/bar@sha256:abcdef0123...6789) that was used to create the results.

I would also encourage you to, if possible, publish your data in some repository, e.g. GitHub; and to cite the data using a fixed hash (e.g. Git commit hash) as well.

With these two pieces of information, anyone can easily do the simplest possible kind of "reproduction" of your results: namely, they can fetch the same Docker image used in the paper, and then run it against the same data used in the paper, to — hopefully — produce the same results shown in the paper.

---

As for #2...

If you're really worried about "trade secrets", you can just solve #2 by making the code itself only "available upon request."

But don't underestimate the number of people in your field who say they're hoarding their code for reasons of "competitive advantage", but who are really doing so out of personal shame at the state the code is in, and fear that a bug might be found there that will invalidate their result.

These people are, IMHO, not embracing the spirit that led them to become scientists. You should want any bugs in your papers — including in the code — to be found! That's what the pursuit of (academic) science is about — everyone checking each-other's work so that we can all believe more strongly in the results!

You don't need to clean up your code. Maybe get an "alpha reader" to go over it first, like self-published authors do, if you're worried about nitpickers. But the only thing code really "needs" to be valuable, is to compile and run and do something useful.

Personally, all I'd want from your repo is for there to be a Dockerfile in there that will, within its fiddly little internal build environment, manage to output the exact Docker image cited in the paper.

If I cared about modifying the code, I could take the rest from there.

Reply


@motiejus 7 days

Replying to @jarenmf 🎙

I highly recommend adding it. It doesn't have to be exposed, but is super useful for anyone who will want to reproduce or build upon your work later.

You can embed this to the PDF, e.g. see section A.1 [1] for how.

[1]: https://raw.githubusercontent.com/motiejus/wm/main/mj-msc-fu...

Reply


@ai_ja_nai 7 days

Replying to @jarenmf 🎙

You absolutely should. Papers should always have reproducible code, otherwise there is no practical usage for the community. Crap is better than nothing.

Reply


@nerdponx 7 days

Replying to @jarenmf 🎙

Consider that "cleaning and organizing" your code means that it is no longer the code that actually produced the results in your paper!

The fact that your code is a mess means that it might be buggy; if other people can see your code, someone might find a bug in it. As you said, this is a good thing for open science, and makes your work easier to reproduce.

Reply


@stevenalowe 7 days

Replying to @jarenmf 🎙

publish; bad code is far better than no code

someone might clean it up for you, too

Reply


@freddref 7 days

Replying to @jarenmf 🎙

I'm a field where it's never been so easy to replicate an experiment, I really wish more people would make their code available. It can be liberating to put your code out there, sort of setting it free, and you'll be much more forgiving if other people's code quality in the future.

The negatives are overestimated, is unlikely many people will read the code.

Reply


@periheli0n 7 days

Replying to @jarenmf 🎙

I have published research code a couple of times. I did this out of principle because I believe in knowledge sharing and collaborative science.

But to be honest, I am truly underwhelmed by the response. For several papers I created Jupyter notebooks that reproduce every single figure in the paper. It has been a huge amount of work. But in spite that the papers with code are reasonably often cited, I’ve been getting only minimal feedback.

So it‘s really difficult to judge whether the effort of properly preparing the code is worth the effort.

On the other hand I have run into several papers that turned out to be not reproducible without the code. Chances are that these particular papers would not have been reproducible with the code, too :D (there were just too many things not adding up). But it would have saved us a lot of time if the code would have been available.

Tl;dr: make the code available, but don‘t invest too much time in polishing it. Hardly anyone is going to thank you.

One exception: if you want to impress future employers, polishing code is worth it. A good portfolio on GitHub can open doors.

Reply


@INTPenis 7 days

Replying to @jarenmf 🎙

I don't think you have to do any work at all on it if you don't want to. Just release it and let people fork it, let it grow.

Reply


@chmod775 7 days

Replying to @jarenmf 🎙

> as a competitive advantage so in this case publishing the code will be removing the competitive advantage.

I wish it wasn't viewed as a competition in the first place.

Reply


@f6v 7 days

Replying to @jarenmf 🎙

1. It might seem like the paper is enough to reproduce to you, but in my field, it’s not more often than it is. Hell, even a software version change produces different result with the same params and seed. So don’t exclude a possibility you’re biased.

2. Code IS a competitive advantage. Some times you’ll reach out to the author to ask for clarification. And after some back and forth they’ll just suggest you send them the data and put them on the paper because they don’t really want to disclose the details or the method they’ve previously published.

3. I don’t think you’ll have issues if you share less than perfect code. Most reviewers are as bad at production code as you are.

All in all, I think sharing code advances science. Yes, there’s gatekeeping, tricks to keep the knowledge inside the lab. But didn’t you choose the field because you want to advance the knowledge, help humankind? Making your research more reproducible by sharing the source code is a step in that direction.

Reply


@amjaeger 7 days

Replying to @jarenmf 🎙

Definitely publish! Check out the Journal of Open Source Software - https://joss.theoj.org/

Reply


@modeless 7 days

Replying to @jarenmf 🎙

> The paper itself is enough to reproduce all the results.

Every researcher thinks this, and it's always wrong. If you care about scientific progress, publish the code and data.

Besides, available code should cause more people to look at your work and ultimately cite it.

Reply


@rvanlaar 7 days

Replying to @jarenmf 🎙

What a great question. You've come to the right community.

My main concern would be to make sure there are no passwords or secret keys in the data, not how it looks.

You'll open yourself up for comments. They may be positive or negative. You'll only know how it pans out afterwards.

Is the code something that you'll want to improve on for further research? If so publish it on github. It opens the way for others to contribute and improve the code. Be sure to include a short readme that you welcome PRs for code cleanup, etc. That way you can turn comments criticizing your code into a request for collaboration. It'll really separates helpful people from drive by commenters.

Reply


@troutwine 7 days

Replying to @jarenmf 🎙

This is a question near to my heart. I'm not an academic but a practicing systems software engineer. A good chunk of my work is sourcing interesting academic ideas and trying to turn them into practically useful software. Papers that don't release their source code are often not as reproducible as the authors think. Perhaps there's a bug that the results depend on, perhaps the implementation is very specific to the context the software runs in, perhaps the paper gives _most_ of what you'd need to re-implement but the fine details are missing. I've seen them all.

In a very real sense unless the paper has a result that's so compelling I can't ignore it if there's no published source code -- even if it's an obvious prototype! -- I'll pass it by. I'm not alone in that in my line of work. Industry folks might also be more willing to accept prototype code than academic folks, I dunno.

Worth consider, I guess, if you're interested in your work crossing the academic/industry boundary smoothly.

Reply


@foo92691 7 days

Replying to @jarenmf 🎙

Yes, absolutely. Next question?

Reply


@mattbaker 7 days

Replying to @jarenmf 🎙

In undergrad I was so grateful whenever a CS paper came with code. It helped me learn and comprehend so much better and I always wanted to thank the person who did it (sometimes I even did if their email was there :)).

You might be doing a young student a solid :D And don’t worry about cleaning it up!

If you use GitHub you could even disable Issues and have a note saying you don’t accept pull requests (in case you’re worried about support burden).

Reply


@PragmaticPulp 7 days

Replying to @jarenmf 🎙

Is your goal to help advance the science and our general knowledge? Publish the code. You don’t even need to clean it up. Just publish. Don’t worry about coding style nitpicks. Having the code and data available actually protects you from claims of fabrication or unseen errors in hidden parts of your research.

On the other hand, if your goal is only to advance your own career and you want to inhibit others from operating in this space any more than necessary to publish (diminish your “competitive advantage”) then I guess you wouldn’t want to publish.

Reply


@phdelightful 7 days

Replying to @jarenmf 🎙

I assume computer science results without published artifacts to be fake. When it's so easy to publish and run code, if the researcher can't even do that then I assume the code does not work and thus the results must be fabricated. If your work has trade secrets or something, research publication with peer review is the wrong way to distribute your results.

Computer engineering on novel systems is a bit harder, but a /complete/ spec of the system (enough for someone to precisely rebuild it) should be published in that case. Remote access on request to the prototype would be better.

Reply


@whatever1 7 days

Replying to @jarenmf 🎙

Publish your code only after you have made the journal publications / conference papers. I have witnessed a researcher getting robbed of his work when another researcher took his almost complete code from github and submitted faster to a journal for publication.

Now both of the researchers have to be cited, but only one of them did the discovery work.

Reply


@godmode2019 7 days

Replying to @jarenmf 🎙

You could also paint a picture no one else will ever see.

Personally, I hate it when academics do not publish their code. Some academics publish the code but not the pretrained model or withhold the dataset, to collect dust on their computer.

People who publish code, datasets and models become the core building blocks of future work. People who don't fade away people do not remember their names.

Reply


@shmageggy 7 days

Replying to @jarenmf 🎙

> Besides, many scientists look at code as a competitive advantage so in this case publishing the code will be removing the competitive advantage.

This is probably wrong, depending on the field. At least in machine learning, the papers that get cited the most are those that other people can easily pick up and work on. They become the basis for future work, get cited as baselines more often, etc. Publishing research ML code is a competitive advantage.

Reply


@X3Xs4gF9oB 7 days

Replying to @jarenmf 🎙

Publishing replication data and code increases the impact and citation rate of published work. For a literature review, see https://osf.io/kgnva/wiki/Open%20Science%20Literature/#Open_...

Reply


@baby 7 days

Replying to @jarenmf 🎙

1. create a github repo

2. push it there

Reply


@lumost 7 days

Replying to @jarenmf 🎙

The economic incentive of science is for your work to be replicated and cited. Not publishing the code and data means your work is harder to reuse for subsequent studies and will hurt citations.

If it's uncommon to release code then I'd doubt anyone in the peer review will review it.

Reply


@maurits 7 days

Replying to @jarenmf 🎙

Given the enormous amount of papers that come out, I personally tend to read papers that come with code (and data) first.

For me, it shows the authors are confident yet also open to critique. Which is a wonderful thing.

Secondly, I usually need the code to really understand the paper.

Reply


@ricogallo 7 days

Replying to @jarenmf 🎙

Publish the code. At worst no one will look at it, at best you will draw more attention to your work and maybe get some good tips.

Reply


@bjourne 7 days

Replying to @jarenmf 🎙

> it will increase the surface for nitpicking and criticism

This is unfortunately. In one of my articles I linked to my github repo where I had implemented the algorithm in C. One of my reviewers complained that I had used C instead of C++. Probably advisable to not publish code before peer review.

Reply


@yellowapple 7 days

Replying to @jarenmf 🎙

Well I'm no scientist, but the one thing I do know about science is that if it can't be replicated then it ain't really science. Seeing as how the code is pretty important for expediently replicating your work, it seems like doing so in the interest of facilitating that replication takes precedent over any sort of "competitive advantage". If that sort of transparency ain't common in your field, then maybe this is your opportunity to lead by example and change that?

Reply


@ThePhysicist 7 days

Replying to @jarenmf 🎙

I published some of my Academic code like a tool for simulating superconducting circuits [1] or a tool to manage lab instruments for quantum computing (or other) experiments [2]. It's super niche but both tools have found users in other labs that even keep developing them (at least for [2]). And it's nice to look at your code after 10 years and realize how much you've grown as a programmer :)

[1]: https://github.com/adewes/superconductor [2]: https://github.com/adewes/pyview https://github.com/adewes/python-qubit-setup

Reply


@_stefanix 7 days

Replying to @jarenmf 🎙

There are scientific venues where the focus is the software architecture and the software product.

I previously published in one of them (SoftwareX by her majesty Elsevier the Evil) and I wish there could be more venues that could bring value and recognition to the piece of code we develop in research for other purposes.

Reply


@nomilk 7 days

Replying to @jarenmf 🎙

> it will increase the surface for nitpicking and criticism

Anyone who programs publicly (via streaming, blogging, open source) opens themselves up for criticism, and 90% of the time the criticism is extremely helpful (and the more brutally honest, the better).

I recall an Economist magazine author made their code public, and the top comments on here were about how awful the formatting was. The criticism wasn't unwarranted, and although harsh, would have helped the author improve. What wasn't stated in the comments is that by publishing their code, the author already placed themselves ahead of 95% of people in their position who wouldn't have had the courage to do so. In the long run, the author will get a lot better and much more confident (since they are at least more aware of any weaknesses).

I'd weigh up the benefits of constructive (and possibly a little unconstructive) criticism and the resulting steepening of your trajectory against whatever downsides you expect from giving away some of your competitive advantage.

Reply


@inetknght 7 days

Replying to @jarenmf 🎙

Your code should have exactly the same license and distribution as your paper. Anyone who tells you different is simply wrong.

If you published a paper that uses information from the code then yes you absolutely must publish your code. Otherwise you're contributing to the decline of science via the opaqueness of papers and irreproducibility problem.

Reply


@drdec 7 days

Replying to @jarenmf 🎙

Publish it on GitHub (or GitLab or your code hosting service of choice).

Then answer any criticism about it by asking for a PR.

To preempt code style complaints find a code formatter for your language and run everything through that first.

Refer to the repository in your paper, but don't put a link. Create a little bit of friction to get to the repo to discourage the casual readers who don't really need the code from popping over too easily.

Reply


@waoush 7 days

Replying to @jarenmf 🎙

In my experience, typically researchers only publish the relevant and/or core algorithms of their research. If you would like, you can always publish the code to Github (if it isn't already), and reference it in your paper.

If it is too much work to refactor the code for publishing, you can also just publish pseudocode.

I don't think anyone will nitpick or criticize coding style or things like that unless it is particularly egregious (ie naming variables something vulgar etc). The point of research papers is to communicate new and valuable findings. If people in this conference or journal are nitpicking things like that, you may want to find a different place to submit your work.

I don't know what your field is, but in Computer Science I can't say I have ever known people to consider their code a competitive advantage. The only time they might shy from releasing code is when they think they can commercialize it or something.

Reply


@mataug 7 days

Replying to @jarenmf 🎙

My take on this is that some code is 10X better than no code.

There have been times when I've had to abandon incorporating an idea presented in a research paper because the paper doesn't have enough information for me to implement it in code. I could've made a lot of progress with some proof of concept code, even if it wasn't clean.

Reply


@kesor 7 days

Replying to @jarenmf 🎙

What benefit would you receive if you publish your code? Will that give you some privilege or earn you more money and/or more reputation?

If the answer to the above is no, and it will mostly cost you time and effort. Then don't publish.

If the answer to the above is yes, then consider the return on investment for publishing your code. If you earn more reputation/money/whatever if you publish than what you expenditure on doing the work of publishing, then publish, if not, then don't.

Reply


@analog31 7 days

Replying to @jarenmf 🎙

Depends on the climate of the field you're in, and where you're at in your career. There are fields where entire research groups routinely harvest preliminary ideas from graduate student publications, and then finish them and rush to publication before the student realizes what's happened.

I'd say, grad student owes nobody anything until they finish, because they're bearing the greatest risk of losing priority, and the openness of science is being used against them. Nothing lost by waiting until they have their degree in the bag before sharing. Then clean it up and use it as part of your portfolio. Or append it to your thesis. Advancing science after you've secured your career is a fair compromise.

I love open source and open science, but also look back on my own graduate studies, and I chose a topic that was protected by virtue of a large capital investment plus domain knowledge that was not represented by code. Also, my thesis predates widespread use of the Internet. ;-)

Reply


@Dowwie 7 days

Replying to @jarenmf 🎙

If you think you've got something that will give you a competitive advantage, seize that advantage. Otherwise, it will be no more than source code on a thumb drive that you'll eventually forget the encryption password to, or gets damaged during a move, or is lost when you stop paying for your cloud storage membership, is lost when you re-partition the wrong part of your hard drive, whatever.

Reply


@taubek 7 days

Replying to @jarenmf 🎙

If you are for Open science(https://en.wikipedia.org/wiki/Open_science_data), go ahead and publish it. Would you ever publish the code on some GIT platform? If you would, this would be the equivalent. A lot of researches don't want to give their data to the public, but if locking their data they are just making harder for others to confirm or improve their findings. I guess sometimes there are legal issues behind that, and sometimes it is pure ego.

Reply


@exdsq 7 days

Replying to @jarenmf 🎙

“Besides, many scientists look at code as a competitive advantage so in this case publishing the code will be removing the competitive advantage.”

While I appreciate this is true, it’s also quite sad. Science shouldn’t be a competitive sport to increase a couple metrics like publications and citations such that useful parts of replicating and extending studies aren’t shared. :(

Reply


@chrisseaton 7 days

Replying to @jarenmf 🎙

> it will increase the surface for nitpicking and criticism

You're supposed to welcome criticism and 'nitpicking' as a scientist.

Reply


@jhrmnn 7 days

Replying to @jarenmf 🎙

I'm a scientist too, in computational chemistry. To me, releasing the research code that accompanies a paper is an imperative. Increasingly, journals or individual (peer) reviewers demand it. It's essential for reproducibility. I consider the work that goes into making the code releasable simply part of the job.

Reply


@osamagirl69 7 days

Replying to @jarenmf 🎙

There are about 100 comments saying the same thing already, but I would highly suggest publishing the code:

1. It gives your work more visibility. If there is a easy git clone route to reproducing your work, it offers a low effort starting point for people to build upon your work which means they are more likely to use it. Plus you get free citations from anyone who touches it.

2. There is no reason that people should be hoarding code in academia, and the only reason people do it now is a sort of prisoners dilemma problem (first person to publish their code had to start from scratch, so they feel possessive and let it die when they graduate). Every researcher who releases their code chips away at the problem and pushes the community to be more open with their code which is intrinsically more efficient.

3. If you get lucky and the community adopts your code it will be viewed very positively by potential future career advancement committee being 'they guy who wrote _x'

4. When I started in academia I based my codebase on an existing publicly available code, which saved me a huge amount of time in my work. I built upon it (not expanding the base code, but using it as a module to integrate experimental measurements to the simulations tools I wrote from scratch) in my PhD and when I graduated I handed a virtualbox image with the whole mess (yay free code--wouldn't have been possible with nonfree code) off to my successors, people in new groups, etc which ended up being the base of an entire new research group at a different university. Every once in a while I get an email asking for help, and get a notification saying that someone cited the code.

Reply


@asoplata 7 days

Replying to @jarenmf 🎙

Absolutely, yes. The other comments here have some fantastic reasons for doing this, and several do a good job of weighing the pros vs cons.

The paper alone is, almost always, never enough to fully reproduce the result. I've been bitten by this almost every time I've tried to implement someone else's computational model. It comes down to that only relying on your paper to explain your code leaves a LOT of room for errors. I've experienced all of these when trying to implement someone else's computational work without their code being published:

    1. Despite your best efforts, you include fundamental, result-breaking typos in the equations you write up to explain the math of what you're doing. This WILL happen to you at some point in your career, and in my experience, it's a problem in >>50% of computational modeling papers.
    2. There are assumptions in the logic of the code that you don't include in the writeup, since they're obvious to you, but you don't realize that someone else trying to understand your paper won't necessarily be starting with those same assumptions. This happens frequently with neural models that use complicated synapse-computation schemes.
    3. Your codebase may be big enough that you think code part X works a certain kind of way from memory, but you forget that you changed the logic late in the project to work in a different way.
    4. Publishing your code at the time of publication prevents "Which version did I use?" problems. It's very common for people to continue to work on their science code for new work, but they don't bother to save/tag a SPECIFIC version of their code that was used for the actual paper. This results in that even the author doesn't know what exact values were used for the results in the paper!
Any "competitive advantage" has to be weighed versus "positive exposure". If your code is the primary research object (as opposed to the data), then it's technically possible that someone may grab your code, extend it to do the next, interesting use of it, and then scoop you before you can do it yourself. However, even if this happens (which it probably won't), consider the following:

    1. You can't build a successful career out of just small extensions to the same piece of code, and so that codebase won't be the main kernel of your career, but rather your understanding of it.
    2. For every 1 person that tries to use that to scoop you, IMHO there's going to be at least 10 other people who see your code and reach out to you for help with it, or just to ask a question about it, or reach out for potential collaboration! In other words, depending on the field, if you publish the code, I think you're likely to gain new/future collaborators at a MUCH faster rate than people who compete against you. You'll be surprised at how many researchers on the other side of the planet are interested in your software!
    3. Even if someone scoops you with your own code, if they give any indication it came from you, you still get to count that as a publication that built off of your software work when you're applying to jobs :)
    4. At least with US federal government funding, it's gradually becoming required to do this anyways, and I believe/hope that it's going to become the standard anyways very soon.
Finally, don't fret about polishing/cleaning/organizing the code, especially style. For others trying to reproduce your results or just investigating how you did things, the main thing that matters is that your code runs "correctly", i.e. how you ran it to get the results that you did. One idea is to publish it "as is" for the CORRECTNESS of the paper, put a git tag indicating "original version", and THEN clean it up on Github/wherever. This helps prevent any new "organizing" of the code from potentially breaking something, which is counterproductive. This way, when people go to your code page, the first thing they see is a nicely-organized version, and gives you time to test that it works the same. Honestly, if you care enough about this at all, then your code is probably significantly more organized than 95% of research code out there; the standards of code quality in science are VERY low, which is completely different than private sector software engineering.

* edits are for markup

Reply


@varelse 7 days

Replying to @jarenmf 🎙

You absolutely should publish the code and dump it on GitHub somewhere. I did that 20 years ago on sourceforge and it backs up a lot of claims that would otherwise get dismissed as me making s** up. Plan ahead and make sure you have the receipts because if your research ever becomes relevant you want to have all the receipts.

Reply


@anton_ai 7 days

Replying to @jarenmf 🎙

Publish it, if it's interesting people will clean and improve it. It's the beauty of open source.

Reply


@sam0x17 7 days

Replying to @jarenmf 🎙

Though you don't mention this particular issue, it often comes up, and as someone who used to work as a DoD research scientist, I will say this: I think academics are largely under the impression that they should be worried about people "taking their idea" and building something amazing with it without compensating you in some way. In reality, it is vanishingly rare that a published paper gets used for anything, by anyone, and it is even rarer by an additional order of magnitude that someone successfully tries to use something without consulting the author and/or trying to bring them along. You are the expert on the thing you have made, so if someone sees massive potential in it, they will likely bring you along. Publishing some quick and dirty research code that is able to reproduce the results of the paper can only help you in the long run.

If you want real protection of course you can always try to get a patent, but then I've got you because 90% of the people I have this conversation with are worried about people stealing their idea but don't think it is patent-worthy.

A similar analogue exists in startups: ideas are really a dime a dozen. Execution is what matters. There are millions of great startup ideas floating around -- I bet almost anyone could come up with at least a few that are viable -- but actually having the follow-through and dedication to execute that idea, that is what is challenging. I can't tell you how many people I've had calls with where the exchange is basically "I want your thoughts on this amazing idea but you have to sign an NDA first". 90% of the time these people aren't willing to go all-in on their idea and stake their career on it (hence them seeking second opinions), so it makes no sense for them to worry about me "stealing" their half-baked, unrealized idea. I say to them "would you take $3M in interest-free debt to develop this idea right now" and they say "no!" to which I say "then why should I sign an NDA?"

Reply


@fxtentacle 7 days

Replying to @jarenmf 🎙

Publish it.

Put a huge note in the readme that this is research code and only licensed for non commercial use.

Put a note on your personal homepage that you're available to hire as a research consultant for $1000 per day.

Companies who like your research will put 1+1 together. A friend of mine got hired straight out of university at a very competitive salary with this approach.

Reply


@clintonb 7 days

Replying to @jarenmf 🎙

If you publish it, please make sure it works today, and can work tomorrow. Pin the versions of any dependencies, or bundle them if feasible.

Also, include basic instructions for running your code.

I helped my wife with a replication study that should have been straightforward, and I was unable to get the code running after about a week. I don’t necessarily believe the research was suspect, but broken code does draw more suspicion.

Reply


@hwers 7 days

Replying to @jarenmf 🎙



@robotresearcher 7 days

Replying to @jarenmf 🎙

Yes, absolutely.

You should have confidence in the correctness of your code if you are publishing.

If your code is a shitshow, why do you trust it? Decent code is to your own advantage even if no one else ever looks at it.

In the best case, it’s possible to build a community around your code, to wide benefit and your career benefit. I’ve seen this with several peers and students.

As a hiring manager, it’s very nice indeed to read a paper and scan the code of an fresh grad applicant.

My lab’s approach is to put the repo in public and put the hash of the relevant commit in the paper. Then you can keep developing there but readers can be confident they can get the exact code used to justify the claims in the paper.

An exception is if you plan to make a company around your IP. You should estimate how likely this is to happen before defaulting to this.

Reply


@mrintellectual 7 days

Replying to @jarenmf 🎙

Having written a couple papers myself, I think it's entirely fair to release the code as is. Code written for research purposes is obviously not suitable for production but can still serve as a great tool for others to understand and build upon your work.

Reply


@throwaway6734 7 days

Replying to @jarenmf 🎙

The code is more important than the paper

Reply


@dgacmu 7 days

Replying to @jarenmf 🎙

You've gotten a ton of feedback already, but: Please do! Don't try to make it perfect. Just publish it. As the saying goes: "The perfect is the enemy of the good." Release the code you used to get YOUR results. You can always improve it later if it turns out people end up really interested in it.

(FWIW, I'm a professor at an R1 university. I give this advice to all of my Ph.D. students and strongly, strongly encourage them to put their code out there on our github.)

Reply


@andi999 7 days

Replying to @jarenmf 🎙

Check with your admins if you are actually allowed to publish it.

Reply


@bee_rider 7 days

Replying to @jarenmf 🎙

Publishing code could be nice, if for example your code has a commercial application and a company wants to use it, a reference implementation might be nice.

Reproducibility -- I dunno. A re-implementation seems better for reproducibility. The paper is the specific set of claim, not the code. If there are built-in assumptions in your code (or even subtle bugs that somehow make it 'work' better), then someone who "reproduces" by just running your code will also have these assumptions.

Coding time -- are you sure? Professional coders are pretty good. If you have, for example, taken the true academic path and written your code in FORTRAN, there's every chance that a professional could bang out a proof of concept in Python or C++ in like a week (really depends on the type of code -- EIGEN and NUMPY save you from a whole layer of tedium that BLAS and LAPACK 'helpfully' provide). Really good pseudocode might be more useful than your actual code

Another note -- personally I treat my code as essentially the IP of my advisor. (He eventually open sources most things anyway). But do check on the IP situation if you want to open source it yourself. If you are working as a research assistant, some or all of your code may belong to you University. They probably don't care, but it is better to have the conversation before angering them.

Reply


@peterburkimsher 7 days

Replying to @jarenmf 🎙

You could add the code as a PDF attachment, so it's available directly with the paper, but somewhat "hidden". That also answers the question about where to host it.

https://alltamedia.com/2014/04/14/how-to-make-a-link-or-butt...

Reply


@bdowling 7 days

Replying to @jarenmf 🎙

This is the wrong forum to ask this question because the audience here is mostly in favor of open disclosure of information and open source licensing of code, which always comes at a cost to someone. For example, publishing your code may have significant impact on whether or not you can obtain ownership protection of your inventions/discoveries. If you are interested in protecting these interests, then you should consult with an intellectual property lawyer.

Reply


@jaclaz 7 days

Replying to @jarenmf 🎙

As always I may be wrong, but the (admittedly very few) times I find an article/paper based or revolving around code that is interesting/useful for some purposes I read the "code is available on request" (or similar) as the (in-) famous Fermat's Last theorem note: Hanc marginis exiguitas non caperet.

Nowadays margins are large enough and cost nothing or next to nothing, and you don't probably have any other use of your code, so what would be the advantage for you in not publishing it?

What kind of competitive advantage does it give to you? (what many scientists think might be not as relevant as what you think about this "competitive advantage" secifically in your specific case/field)

About "cleaning it", why?

I mean, if as-is it works (but it is "ugly") it still works, what if in the process of "cleaning it" you manage to introduce a bug of some kind?

Unless you plan to also re-test it after the cleaning, I guess it would be better to not clean it at all.

Reply


@1MachineElf 7 days

Replying to @jarenmf 🎙

Was the research funded with public money? If so, then the public interest would be a reason to publish the research code.

Reply


@Mageek 7 days

Replying to @jarenmf 🎙

Research is not a zero-sum game, it’s about bringing useful knowledge to everyone. If releasing your code aids in understanding what you’re contributing, then by all means please contribute! You don’t even need to spend a ton of time cleaning it up. Releasing it “as is” is fine. Hoarding your code as a defense against criticism goes against the entire purpose of open academic work.

Reply


@petters 7 days

Replying to @jarenmf 🎙

> Besides, many scientists look at code as a competitive advantage so in this case publishing the code will be removing the competitive advantage.

One of my most cited papers is a relatively uninteresting one we wrote for a conference competition. But we have code so it is easy to compare your alternative approach to us. That means citations.

So it can work for your benefit as well.

Reply


@moralestapia 7 days

Replying to @jarenmf 🎙

>But on the other hand it's substantially more work to clean and organize the code for publishing

It's better than nothing, it also is the only way for others to reproduce your results. I am surprised you were not asked to do that by whatever journal you chose to publish your results.

>many scientists look at code as a competitive advantage so in this case publishing the code will be removing the competitive advantage

LOL, what!? What is this crap about "competitive advantage"? Are you privately funded? Then it's fine. If you're funded by public (i.e. government) money, you are (at least ethically) obliged to share your work with everybody.

Reply


@UncleOxidant 7 days

Replying to @jarenmf 🎙

Emphatically YES. Put the code on GitHub. It doesn't have to be perfect. Especially if it will take two months for someone to "get it right" from the paper. I've been involved in projects where we were trying to reproduce results from some paper both with code and without. The description of an algorithm in a paper can sometimes be unclear, often reading the code makes the description in the paper much clearer. In the cases where no node code is provided it's that much harder to reproduce the results. You want to make it as easy as possible for others to reproduce your results - give them your code - put it in a github repo. If they spot discrepancies between the code and what's described in the paper, then all the better - you can use that feedback to improve both.

I'll add: I think that we need to change the mindset in academia about code. If code was involved in producing the results in the paper that code should be considered part of the paper and (at least) as important as the text of the paper. (Same for data)

Reply


@NamTaf 7 days

Replying to @jarenmf 🎙

Disclaimer: Not an academic, and my whilst undergrad thesis included code it was so broken that when others saw it I had nothing to lose except my pride.

Personally, I would. Open source is a form of peer review, and if you're wanting to stand by your paper as peer-reviewable then I believe the code should be included in that. Generally speaking, I feel more researchers need to open up their code to peer review because generally speaking, research code tends to not have the same robustness against mistakes (through coding convention as well as tests) as professional software development. I shudder to think how many papers have flawed results that no one realises and are just accepted, because no one can spare the effort rebuilding the code from scratch and without any prior reference in order to verify said results.

I don't think you need to clean it up. You're not competing for a coding elegance competition, but rather allowing someone to find bugs if they exist and point it out, just as they would peer reviewing your paper.

More cynically, spaghetti code probably helps as a defense against people ripping off your code, so if you're worried about your competitive advantage then not cleaning it up is a form of security through obscurity :)

Reply


@ppod 7 days

Replying to @jarenmf 🎙

An excellent paper on this issue here: https://aclanthology.org/J08-3010.pdf

Agree with other comments on CRAPL, but you should release it.

Reply


@d--b 7 days

Replying to @jarenmf 🎙

Maybe do it and see what happens. If something bad then don’t do it again…

Reply


@rlewkov 7 days

Replying to @jarenmf 🎙



@taneq 7 days

Replying to @jarenmf 🎙

Please do! Maybe the paper is technically enough to reproduce the results but if other researchers can start from a working example, they can both verify your results and extend them with more original research far faster.

While any published code receives some nitpicking and bikeshedding, most academic code is terrible so unless you literally use random joke/meme variable names as your only 'documentation' (I wish I were joking) you're not going to look bad to anyone who matters.

Reply


@larrydag 7 days

Replying to @jarenmf 🎙

Here's a good example. The Fisher's iris flower data was released with his work in a 1936 paper. It was used as an example of his discriminate analysis. This data set has been repeatedly used over and over to show examples of cluster analysis and segmentation. Many statistics teachers use it in their curriculum. You never know where the research could lead to growth and development in a field.

https://en.wikipedia.org/wiki/Iris_flower_data_set

Reply


@del82 7 days

Replying to @jarenmf 🎙

> it's substantially more work to clean and organize the code for publishing, it will increase the surface for nitpicking and criticism (e.g. coding style, etc).

Matt Might has a solution for this that I love: Don't clean & organize! Release it under the CRAPL[0], making explicit what everyone understands, viz.:

"Generally, academic software is stapled together on a tight deadline; an expert user has to coerce it into running; and it's not pretty code. Academic code is about 'proof of concept.'"

[0] https://matt.might.net/articles/crapl/

Reply


@bipson 7 days

Replying to @jarenmf 🎙

This whole mindset is so shockingly wrong from an academic perspective.

Research based on or involving code/models/algorithms should always be accompanied by a code drop. Nobody expects the code to be of good quality.

Everything else is not reproducible - and against the scientific codex (IMO).

I read so many papers that claim incredible results, and wondering how they implemented their models in this particular simulator (close to impossible with only what is out there), only to find that there is just nothing to be found, anywhere. No repo, no models, no patch. NIL.

Sending an E-Mail? No response.

Further, anyone could just claim anything this way. Why bother doing any real work?

What if there is a small error in the code?

Wouldn't it be better to know that? In a scientific sense, searching for "the truth"?

Reply


@Cynox 7 days

Replying to @jarenmf 🎙

Just do a super-minimal cleaning and upload to Zenodo or similar, then stick the DOI to the code and input/output files in your paper somewhere. 99% certain your reviewers will not brother to look at your code. 10 years from now someone new looking into the same topic gets a leg up. Don't feel obligated to update, clarify, or even think about the code ever again. If you want to build a community or something, then by all means go for github, but providing code along with your paper should be something automatic and quick, not adding an unwanted burden.

Reply


@Entalpi 7 days

Replying to @jarenmf 🎙

Yes.

Reply


@overgard 7 days

Replying to @jarenmf 🎙

I don't know your field, but personally when I read a paper the code makes things 100x clearer and resolves my questions. Are you afraid people will use your code?

Reply


@emmericp 7 days

Replying to @jarenmf 🎙

Yes, you should publish it. Don't bother cleaning it up if you don't feel like it. No one will judge you for the code quality.

Published terrible code is far better than unpublished code.

Reply


@nicec2 7 days

Replying to @jarenmf 🎙

Two times I have published my research code - both times I have found many other papers/projects plagiarized my work without giving me any credit. This happens way more than you would think, especially if you are working under less known advisor, and at less known university.

As the other comment said, if you care about "advancing the science", and won't mind stuff like the above happening, then go for it. In my experience, it is not worth it.

Reply


@pppoe 7 days

Replying to @jarenmf 🎙

If your field is not embracing open source yet, you should go for it ASAP. I believe in the end the field will recognize the benefits and move towards that and the sooner you are the larger impact you will make.

Reply


@smoyer 6 days

Replying to @jarenmf 🎙

Can you provide a description of the what the software does and what language(s) it uses?

Reply


@Siira 7 days

Replying to @jarenmf 🎙

Publish it under AGPL, after your paper has been accepted. If criticism surfaces after your paper has been published, great, you can now write a paper about your V2.

Science progresses by criticism, after all.

Reply


@CJefferson 7 days

Replying to @jarenmf 🎙

I've posted a huge amount of academic code (I've linked to a small number at the end). I think you should, but it won't help advance your career immediately. However, I still think it's better for science.

What is useful is if you can produce code people can build on and do their own cool stuff with -- then they will cite you. However, getting something to a state where it is tested for all reasonable inputs, has some basic docs, etc. is a hard untaking.

https://github.com/minion/minion (C++ constraint solver)

https://github.com/stacs-cp/demystify (Python puzzle solver)

https://github.com/peal/vole (Rust group theory solver)

Reply


@sjg007 7 days

Replying to @jarenmf 🎙

Yes you should publish your code.

Reply


@sfifs 7 days

Replying to @jarenmf 🎙

Please publish. Even if you are filing some patents or working on some other commercial licensing, you could publish under a source available licenses. Just yesterday I saw in the 3D printing Reddit some academics developed an interesting approach to segment large panel sections and posted a paper. A number of people are interested to try it but no source code seems to be published - so I just moved on as I'd have to take the trouble to reimplement the paper even to just try it .

Reply


@otherme123 7 days

Replying to @jarenmf 🎙

Genome Research made me publish the code used for the data analysis, requiring a zip of the repo for archiving.

The thing is that I was required to provide a way to reproduce, so code obfuscated and/or uncommented were not a problem. I provided clean code anyway.

Reply


@cletus 7 days

Replying to @jarenmf 🎙

Disclaimer: I'm not an academic. I cannot possibly speak to the possible benefits and implications of this from an academic point of view. Like there might only be downside to doing this. I don't know and don't pretend to know.

As an outsider looking in, many academic fields seem to have a reproducibility crisis. Many psychological studies, for example, cannot be reproduced yet they continue to be cited.

I personally feel like every academic paper should be reproducible. I should be able to email you the study and you should get the same results. Obviously clinical trials may vary (and thus the important of statistical significance) but the real problem is data and models. If I, as someone reading your study, don't have your data, how can it possibly be reproduced? If I gather my own data will I get completely different results? If I'm solely relying on what details you give, how do I know you haven't made a fatal assumption or even just buggy code with your model?

I personally feel like a condition of all Federal funding should be that the data and any code should be made freely available.

So I support the idea of releasing it and that releasing something messy is better than releasing nothing but I can't speak to your individual circumstances.

Reply


@kyruzic 7 days

Replying to @jarenmf 🎙

Yes you should absolutely publish it. I wrote a paper about modelling radio wave propagation through the ionosphere all my code for it is on my github. The reason you should is simple, you are providing proof that your numbers aren't just made up.

Reply


@sampo 7 days

Replying to @jarenmf 🎙

(A) Publish your code as is, so the code is the actual code used in the paper.

> But on the other hand it's substantially more work to clean and organize the code for publishing

(B) Don't spend time cleaning code for publishing. Spend your time writing more papers.

> it will increase the surface for nitpicking and criticism (e.g. coding style, etc).

(C) Don't worry about this.

> Besides, many scientists look at code as a competitive advantage so in this case publishing the code will be removing the competitive advantage.

(D) If you do B, if will also reduce your worries about this. I am half joking.

Reply


@howLongHowLong 7 days

Replying to @jarenmf 🎙

It's sort of funny when the pro list includes "better for science" and there's still a need for a con list. There should be a scientific equivalent to the hypocratic oath; a lot of us laypeople imagine that scientist default to "good for science" and "ease and possibility of replicatability."

Reply


@programmarchy 7 days

Replying to @jarenmf 🎙

Depends if you want the public to be able to apply your research or if you want to keep the "competitive advantage" to yourself. If your research was funded by public grant money, then I think you owe it to the public.

Reply


@skulumani 7 days

Replying to @jarenmf 🎙

I always published all of my code/papers/source for my publications. I never made anything "revolutionary" but I still felt it was important to produce reproducible research, even if relatively insignificant.

This was kind a change for my advisor who was definitely less interested in that aspect of research. I think this is an issue in academia and needs to change.

Also, ultimately if someone wants to copy and publish your work as their own it will be relatively easy to show that and the community as a whole will recognize it.

Also, for me it felt good when another student/researcher was aided by my work.

https://shankarkulumani.com/publications.html

You don't need to clean it up or make the code presentable. Everyone knows it's research grade code. Most important part is that you have the code in a state that you can reuse in the future for another publication.

I've been saved multiple times by being able to easily go back to decade old work and reproduce plots.

Reply


@soheil 7 days

Replying to @jarenmf 🎙

If you're proud of what you achieved then implement the code, but if you fear it's not good enough and rather stay under the guise of ambiguity of "imagine what could be" don't publish the code. Publishing code is less evil.

Reply


@jrhawley 7 days

Replying to @jarenmf 🎙

You're right, it is substantially more work to clean and organize the code for publishing. Being open about your work does make the attack surface much larger and more likely to be nitpicked, criticized, have an error found, etc.

But it is more honest. Whatever you think about the effort required to do this, there's value in honesty.

Here is an example of my own scientific work:

- paper [0]

- preprint [1]

- GitHub [2]

It certainly wasn't easy to get all of this done. But doing this can also be a guide for others. They get to see exactly what you've done so that they don't waste months on the exact implementation. They can see where maybe you've made some mistakes to avoid them. They can see so much of the implicit knowledge that is left out of your paper and learn from it. Your code isn't going to be perfect, but what paper is, either?

Everyone will be a critic, anyway, so make it easy to pick up criticism of the stuff you feel the least confident in and do better next time. You won't get better if no one sees your code.

[0]: https://cancerres.aacrjournals.org/content/81/23/5833

[1]: https://www.biorxiv.org/content/10.1101/2021.01.05.425333v2

[2]: https://github.com/LupienLab/3d-reorganization-prostate-canc...

Reply


@bachmeier 7 days

Replying to @jarenmf 🎙

The trend is in the direction of requiring open code and data. There's been a big movement that direction in economics, and most fields will likely also move that way, so it's more a question of whether you should do it now or in the future.

For the journal I edit, authors are required to include the code and data with the submission. The code and data are available along with the paper if it's published. We do replication audits of some papers to make sure you can take the materials they've included and reproduce every result in the paper. If not, the conditional acceptance changes to rejection. I've had cases where reviewers found errors in the code, so I rejected the paper.

On the argument that it's a competitive advantage: what does that mean? You should be able to claim results but not show where they came from? That's not science.

Keep in mind that this is a "source available" requirement, not an open source requirement. It is a matter of transparency. You have to let others see exactly what you did.

Reply


@ckemere 7 days

Replying to @jarenmf 🎙

As someone making a career in academia, I recognize both pros and cons here, but I think that the pros far outweigh the cons. Essentially, I think the question is one of identity - do you want your reputation to be "This investigator is the kind of person who's code is always available"? I know that as I evaluate job applications, funding proposals, or papers, I weigh this reputation highly, and consider the opposite "This investigator is the kind of person that hesitates to share their code" to be a big red flag.

BUT, I have definitely encountered the situation where I read a paper, then looked at the associated code, and found that the exciting result was entirely because of a bug. The reputation, "This investigator is someone who does shoddy, error-prone work" is probably the worst possible one.

Reply


@shadowgovt 7 days

Replying to @jarenmf 🎙

It seems to me that if it would take two months to replicate the actual mechanism, you're doing the world a favor by publishing what that two months of work resulted in.

If you want to do the world or further favor, get a grad student to read it first and indicate where they cannot follow the code. In my brief stint in academia, I saw very little overlap between brilliant theoreticians coming up with novel approaches and code to support them and people who knew how to write readable code.

Reply


@hmate9 7 days

Replying to @jarenmf 🎙

I absolutely LOVE research that has code released with it. Just because then I can quickly explore the code and play around + tinker with it.

Like others have said, research code isn't meant to be production quality code so I wouldn't worry about "quality" in that way.

Reply


@tagh 7 days

Replying to @jarenmf 🎙

Publishing the code does have some selfish benefits too: better chance of people building on your research (and citing it).

Reply


@maxnoe 6 days

Replying to @jarenmf 🎙

That you were allowed to publish the paper without the code is the core problem here.

You shouldn't even be able to ask this question. The journal should have required you to first or along with the paper publish the code.

Unfortunately, the number of Journals that do this is still small and even the ones that do sometimes are even satisfied with a "Code can be obtained upon request".

Reply


@thayne 7 days

Replying to @jarenmf 🎙

My 2 cents: you should publish the code as you used it in your research, so that it's possible to review your code. If there is a bug in your code, that could impact your results, and that problem would be much harder to find/reproduce without your source code.

Reply


@didip 7 days

Replying to @jarenmf 🎙

Why not? It's a loss to humanity's progress if all researchers make it difficult to find the code and data.

Reply


@pointlessone 7 days

Replying to @jarenmf 🎙

I am not a researcher in the sense that I'm not publishing papers but I'm a consumer of research. Every day I can find the source code for the paper is a great day. Even if it's some language I don't use I still have something to go off of. Often it's easier to red some code to understand the method than to read the paper itself. I'm used to read code. I do it almost everyday and I'm relatively proficient at it. I'm not very well at untangling academic language or having to read 30 years worth of papers to get all the assumptions made in a paper.

As an example, I've found a paper that promises a method to do the very thing I want to accomplish. It's not too dense but it skips a few crucial moments and I've been working on coding the method for a year now (on and off, of course but still for a long time). If the code was available it probably wouldn't take as long. The paper didn't mention that the code was available upon request but it was implemented in a piece of software. I've found it eventually but it was a version just before the feature I'm after was added. I tracked the author and they were great sport about cold emails bet didn't have the source any more.

So yes, please publish the code. You don't have to clean it up. It worked for the paper — it's good enough. Even the most terrible code is immeasurably better than no code.

Reply


@jakupovic 7 days

Replying to @jarenmf 🎙

Yes, you should publish. Mainly because it will give you a sense of accomplishing something, also, nobody cares really :). Especially about old code, if they do that's even better.

Reply


@albertzeyer 7 days

Replying to @jarenmf 🎙

> The paper itself is enough to reproduce all the results.

No, this is almost never the case. It should be. But it cannot really be. There are always more details in the code than in the paper.

Note that even the code itself might not be enough to reproduce the results. Many other things can matter, like the environment, software or library versions, the hardware, etc. Ideally you should also publish log files with all such information so people could try to use at least the same software and library versions.

And random seeds. Make sure this part is at least deterministic by specifying the seed explicitly (and make sure you have that in your log as well).

Unfortunately, in some cases (e.g. deep learning) your algorithm might not be deterministic anyway, so even in your own environment, you cannot exactly reproduce some result. So make sure it is reliable (e.g. w.r.t. different random seeds).

> In my field many scientists tend to not publish the code nor the data.

This is bad. But this should not be a reason that you follow this practice.

> clean and organize the code for publishing

This does not make sense. You should publish exactly the code as you used it. Not a restructured or cleaned up version. It should not be changed in any way. Otherwise you would also need to redo all your experiments to verify it is still doing the same.

Ok, if you did that as well, then ok. But this extra effort is really not needed. Sure it is nicer for others, but your hacky and crappy code is still infinitely better than no code at all.

> it will increase the surface for nitpicking and criticism

If there is no code at all, this is a much bigger criticism.

> publishing the code will be removing the competitive advantage

This is a strange take. Science is not about competing against other scientists. Science is about working together with other scientists to advance the state of the art. You should do everything to accelerate the process of advancement, not try to slow it down. If such behavior is common in your field of work, I would seriously consider to change the field.

Reply


@jpswade 7 days

Replying to @jarenmf 🎙



About Us

site design / logo © 2022 Box Piper