The Problems in Science

Vision: Towards a Verifiable, Interconnected Graph of Scientific Knowledge

The replication crisis—the inability to reliably reproduce results of scientific studies—is the highest leverage problem facing humanity today. Solving it could expedite countless other solutions which aim to mitigate global scale crises like climate change, disease, etc.

One of the goals in this hackathon is to build tools, prototypes, or methods that make scientific knowledge accessible, verifiable, and actionable by leveraging AI and knowledge graphs.

This hackathon is about laying the first bricks in the process of solving the replication crisis. Together, we are building an open and distributed foundation for human knowledge. It's an attempt to Extract, Transform, and Load (ETL) the scientific record as it stands today. While the scientific method is still sound, the systems defining the incentives in academia—tenure, publishing pressures, fear of retraction—often conflict with open, transparent, reproducible science.

We envision something better: a new infrastructure for scientific knowledge. Think an open, distributed, resilient graph – machine-readable, machine-actionable – that actually maps out the findings and the messy process of science.

What's the dream?

Open state data networks built on iterative publication with provenance and context as the foundation for novel findings
More effective metrics built on more robust data. Concepts like spotting trends, seeing how ideas really evolve, finding biases, figuring out what research actually works – things difficult or impossible to see clearly now.
Having knowledge structured in a machine readable and machine actionable format so AI can help analyze and synthesize information, expediting the process of human transformation.

The ETL process for science is more than just simple manuscript parsing. Human knowledge, viewed as a collective dataset, is an abstract concept and the associated data quality problems are similarly abstract. The following perspectives on the problems of science are presented through the lens of known cleaning steps required in an ETL pipeline.

We are not asking you to solve the problems listed below, as many of those problems in the existing system are coordination and incentive design issues, born of the inability of publishing, funding, and promotion processes to adapt to the digital age.

Rather, we are explaining the problems so that you can understand the necessity for the cleaning steps listed as Hackathon Ideas

Author's Note: Full disclosure: The author of this section is not a researcher, rather a technologist who has spent the last few years trying to wrap their head around academia. The intention of this document is not to be 100% factually correct, but rather to provide necessary abstraction and comparisons needed for net-new developers to get up to speed quickly. Things may be wrong and (largely speaking) that's ok. If you spot something egregious, email the author at: [email protected]

The Current Incentive Structures

Publishing and Impact

Professors want tenure. Tenure represents job security and the freedom to pursue interesting ideas. The way to get tenure is by generating and publishing novel findings which have a long-term impact on a field. Novel findings are unique and interesting truths about the world which no one has ever thought of before. Publishing a novel finding means writing up a manuscript and submitting it through a journal.

The current way that we measure the impact of findings is based on the number of citations that a publication receives. Pieces of scientific work build on top of each other, similar to how code repository work. A citation is the scientific equivalent of a dependency. More dependencies means more impact. A researchers career can be (although it shouldn't be) summed up by a number called their h-index, which is the number of papers that they have published with the same number of citations.

Citations typically accrue over the course of years, making them lagging indicators of quality. The leading indicator that researchers are judged on the quality of journal that they submit to. The Impact Factor is the singular metric for journals. It asks "on average, how many citations does a paper submitted to this journal get". A very small number of privately owned companies heavily restrict their publication volumes to only include the most sensational yet plausible findings, and have built very high impact factors. Publishing a manuscript in these high impact journals is typically seen as a major accomplishment.

The Importance of Retraction

Incentives define behavior, they can be both positive and negative.

The worst possible thing that can happen to a researchers's career is something called a retraction. If a researcher publishes something which is fundamentally incorrect, they are forced to admit the wrong and retract the article. This means leaving it on the scientific record but putting a big red stamp saying that they were wrong on top of it.

Grant money is competitive, why would a funder give money to a researcher who was wrong when they've got 20 other people with brilliant ideas also asking for that money. Even a single retraction is incredibly detrimental to a researcher's career.

As someone coming from the world of technology into the world of research, it was always hard for me to fully grasp the implications of a retraction. You got something wrong. Sure, we all do that. What's the big deal? Developers get things wrong all the time.

There are two fundamental differences I see between science and technology which drive this severity at the macro level:

Job Scarcity: There's always another job for a front end developer, it doesn't matter how badly they screwed up. There are only five jobs on the entire planet for a person who has spent their life specializing in the study of fluid flow turbulence in urban settings. Kind of hard to jump to the next one if that scarlet letter sits on your permanent record. Once again, there are countless other people competing for those 5 jobs.
Cycle Times: A repository can be made in a weekend. 25 generations of caterpillars takes a year to breed, minimum. Both making the finding and fixing the problems takes exponentially more time in research than it does in technology.

For the developers in the room, I would ask you to abandon what you know about GitHub and imagine that you're developing an operating system back in the 1960s. You just put your OS code onto 25 floppy disks and mailed them to their colleagues across the nation. Turns out your code had a severe bug. Imagine the severity of that bug. By the time someone realizes the bug in your code and let you know about it, everyone already has it on their system. How many expensive mainframes have you bricked? How many months of your life is it going to take to make sure everyone gets that fixed? The stakes are much higher. While not a perfect example, it's at the least illustrative of how science works. Nobody is taking your next floppy disk.

The Problems in Research

The Challenges of Finding Impactful Results

Impactful findings are hard to produce. Let's say that a brilliant researcher has an interesting idea. They meticulously designed an experiment and conduct it flawlessly. Then Mother Nature says, "No, that's not how life works". This situation is called a "null result" and it's very common. As a researcher, you do not get tenure from creating null results.

It took years and quite a bit of money to find this null result. the researcher is forced to ask themselves a question, should I publish this? The process of submitting a manuscript through a journal is often slow, laborious, risky (acceptance isnt guaranteed), and expensive. The research could do a few things:

If the researcher tries to publish the null result, they are incredibly altruistic. This is equivalent to giving away their time. This case doesn't happen often, so we're going to ignore this case.

The File Drawer Problem

The researcher decides that it is not worth their time and effort to go through the publication system for a manuscript which says nothing of interest. They take the results, shove them into a file drawer, and never show them to the world. As researchers come into a space, they often also ask the same question. They conduct the experiment again, get the same results, and then don't publish those results. This is what's known as the file drawer problem. The scientific record contains a large number of undocumented dead ends which humanity has spent countless amounts of money researching but still has no visibility into.

P-Hacking

The researcher, after receiving these null results, tries to think of a series of different ways to analyze the data which might yield something more interesting. They might even do a little bit of data cleaning. "The grad student bumped the machine during this run, we should take it out".

This subtle data cleaning intended to yield more novel results which are worthy of publication is called P Hacking and has become an unfortunate standard practice for many fields. While typically harmless on a manuscript by manuscript level, the aggregate effects of P hacking can have outsized impacts on the truthfulness of the scientific record as a whole.

Outright Fraud

**It should be noted that outright fraud is not the norm, and is very much an outlier with an academia. The vast majority of researchers are honest and hard-working individuals in pursuit of truth and knowledge.

Let's say this was a big and expensive study with a lot of important researchers involved, there is pressure from leadership to find something interesting. The Researcher makes a difficult choice to fundamentally alter their data. Example 1, Example 2, Example 3.

Entire communities are dedicated to exposing this problem. Because there is no provenance on information, distinguishing fraud from P-hacking from legitimate research can often be incredibly difficult. As such, groups like the center for Open Science are currently running a hackathon for the identification of results which do not replicate.

Manuscripts as the Primary Output

Papers. Just to repeat that, papers. The internet's been around for decades, and science's main output is still formatted like I have to go to Kinko's.

Scientific Data coming straight off a machine in a lab is the closest thing we can find to the abstract concept of truth, building pathways for open and interoperable data. rinterp

Data Hoarding

Data is gold. It's the closest thing to ground truth. It costs a fortune to generate – maybe your grad student spent a year in the Oregon woods tracking caterpillar mating habits for $100k. That dataset? Absolutely invaluable! You could probably mine 15 papers out of it. So, do you publish the raw data alongside paper #1? Why would you? Funding is a zero-sum game. There are maybe 20 other labs competing for the same grants. Publishing your data just gives them a leg up to find novel results you could have found, get their citations up, and grab that funding. Incentives in the current system scream "Keep it closed!"

Lack of Code Publishing

Code is how data becomes results. It's essential. But it rarely sees the light of day. Why?

Surface Area for Retraction: Most Scientists Aren't Software Engineers. They shouldn't have to be! A PhD in biology should be learning biology, not debugging Python memory leaks. They cobble code together using Stack Overflow and now GPT, doing their best to make it work. That's fine! But it means the code might be… fragile. Publishing your code means those same 20 competitors can pick it apart. Find one tiny bug, one mixup in variable names? Boom. They can cast doubt on your findings, maybe even push for a retraction. It's just not worth the risk.
Support for Reproducibility: Reproducibility is the process of rerunning someone else's code and data to try and get the same results. It's surprisingly difficult to ensure functionality 5 years later. Dependency issues, datasets lost to link rot and content drift, and a multitude of other issues make reproducibility more challenging than a person would expect. When some poor grad student in another lab is asked to try to run your code 5 years later, they'll send you 25 emails asking for help. That's 25 hours of your life debugging their setup for the possibility of one extra citation. You could have spent those 25 hours writing two grants. Guess what advances your career more?

The Pressures of Being a Researcher

The academic career path, especially the climb to tenure, is brutal. The metrics used to judge success often push researchers towards behaviors that, while maybe rational for the individual, are bad for science as a whole.

Publish or Perish

It's not just a saying. Constant publication in "good" journals is the currency for grants, jobs, tenure, respect. This pressure dictates everything.

The Tenure Chase & Metrics

Getting tenure means job security and freedom. The path there? Prove you have impact. How? Publish novel findings in prestigious journals (think Nature, Science – the ones that publish maybe 5% of submissions, often the most "sensational yet plausible" stuff) and get lots of citations. Your career can feel like it boils down to metrics like your H-index (15 papers with at least 15 citations? H-index = 15). This relentless focus on novelty and impact makes it tough to justify publishing work that just confirms something, or worse, shows something didn't work.

Salami Slicing

Got one good study? Why publish one comprehensive paper when you can slice it thinly into three or four smaller ones? Maximizes your publication count! Makes your CV look better! Also makes it impossible for anyone to get the full picture without painstaking reconstruction.

Committees of Experts and Collusion

Many of the above problems come from the scarcity or knowledge and the existence of experts in a specialized field. There are only 20 people in the world who understand the work you do, 5 of those 20 people have the ears of a government funder, another 5 are the editors for the journals specific to your field. This has at times, led to collusion within fields which spawns problems of its own. As such, we at BioXYZ have collected a committee of experts (listed in Judges and Mentors) to see if we can help guide you through these problems.

Our Approach: ETL for Knowledge

So, what's this hackathon about? It's an attempt to Extract, Transform, and Load (ETL) the knowledge currently trapped in PDFs and file drawers, into a distributed, machine-actionable graph structure.

Extract: Identifying scientific claims, data, methods from publications. Transform: Resolving conflicts, spotting inconsistencies, handling null results. Load: Populating a distributed knowledge graph, accessible by APIs and queryable by AI tools.

Extract: Pulling information – claims, methods, data pointers, entities, relationships – from papers, databases, wherever they lives.

Transforming in this context is more than taking out NANs from a CSV. This is the hard part, the metascience part. It's about cleaning the scientific record itself. How do we represent conflicting findings? How do we flag results that failed to replicate? How do we surface those crucial null results from the file drawer? How do we link a claim definitively back to its evidence? The prizes for this event go to the people who make knowledge trustworthy and complete.

Seek out metadata for each paper and each claim, that to a human would be too complex or tri

Lastly, the AI agents need a knowledge graph that is live, queryable, and verifiable.

Open science is knowledge graphs being built via distributed compute as part of a distributed agentic fleet, governed by distributed communities.

Important Notes:

The systems of science are the problem, not scientists themselves. We need the existing scientific record and the brilliant minds building it. Decades of work from the existing system serve as the basis of ETL for a better system.
We are NOT trying to fix the existing system. The Open Science Movement has tried to fix the current system for ~20 years at this point and has never reached the velocity it deserved. We're not tweaking parameters in peer review, automating manuscript submission using JATS, or making new citation games within the old system.

Agentic science is about rethinking human knowledge from the standpoint of AI resources.

This hackathon encourages you to build the foundational blocks and tools to help BioAgents escape the traps of academia and move faster towards actual understanding.

PreviousBio x AI Hackathon NextTechBio

Last updated 1 month ago