Sci-Stalker: AI Software Tracking the Conversion of Congress Abstracts into Scientific Publications
Developed under the leadership of Dr. Emre Gecer, Sci-Stalker is an automated research software that uses OpenAlex, PubMed, and CrossRef data to track whether abstracts presented at medical congresses are converted into peer-reviewed scientific publications.
Developed under the leadership of Dr. Emre Gecer, Sci-Stalker is an automated research software designed to track whether abstracts presented at medical congresses are converted, over the years, into peer-reviewed scientific publications.
Hundreds of studies are presented at every medical congress. These abstracts often carry the latest traces of clinical practice, academic output, and scientific curiosity. Yet it is rarely known, in a systematic way, how many of these studies were later turned into journal articles, which found a place in the scientific literature, and which remained in the conference proceedings and were lost.
Sci-Stalker focuses precisely on this question: does a scientific study presented at a congress turn, over time, into a peer-reviewed publication? The software offers an infrastructure that follows the invisible path between congress abstracts and scientific publications, makes the fate of academic output measurable, and adds a new layer of monitoring to the research ecosystem.
What Does Sci-Stalker Do?
Sci-Stalker takes a congress proceedings book as its starting point. From this PDF document it extracts the presentations, parses out the authors, structures the titles and texts, and then searches for this data in international scientific databases.
Using sources such as OpenAlex, PubMed, and CrossRef, the software builds evidence-based matches between congress abstracts and the articles that were subsequently published. For each abstract, publication status is evaluated at four levels of evidence:
- EXACT — a strong, direct match.
- PROBABLE — a high-likelihood match.
- POSSIBLE — a possible match that warrants careful review.
- NO_EVIDENCE — an abstract for which no evidence of conversion to publication was found.
This structure makes it possible to read academic output not only in numerical terms but also in terms of levels of evidence.
Why Does It Matter?
Medical congresses are often the first place where scientific output becomes visible. A study is typically presented first as an oral talk or a poster; it is then expanded, submitted to a peer-reviewed journal, and enters the scientific literature. This process, however, does not always run to completion.
Some abstracts become strong articles. Some are published years later. Some appear under entirely different titles. And some remain in the conference proceedings and never enter scientific circulation.
Sci-Stalker brings this uncertain territory into view. It helps to analyze systematically which abstracts have been converted into publications, in which fields conversion rates are high, in which years output has been stronger, and which studies have left no trace in the literature. For this reason, Sci-Stalker is not merely a technical software project; it is an important tool for academic transparency, scientific traceability, and the evaluation of research quality.
First Application: TOTDER 2011–2024
Sci-Stalker's first comprehensive application was carried out on the congress abstracts of the Turkish Orthopedics and Traumatology Association (TOTDER). Fourteen years of congress data, from 2011 to 2024, were analyzed.
Within this initial study, 898 congress presentations and 1,375 unique authors were evaluated. After duplicate records had been cleaned, the publication pool drawn from PubMed and CrossRef reached a search space of 100,276 scientific publications. The first results provide a striking picture of how often congress abstracts are converted into scientific publications.
TOTDER 2011–2024: First Results
- Congress years analyzed: 2011–2024
- Total presentations: 898
- Unique authors: 1,375
- Publication pool: 100,276
- Confirmed publication match (EXACT): 182 (20.3%)
- High-likelihood publication match (PROBABLE): 30 (3.3%)
- Possible publication match (POSSIBLE): 10 (1.1%)
- Abstracts with no evidence of publication (NO_EVIDENCE): 670 (74.6%)
These results are based on PubMed and CrossRef data. With the OpenAlex integration ongoing, the final figures are expected to rise.
How the Software Works
Sci-Stalker's logic follows the chain from congress proceedings to scientific publication step by step. First, the congress PDF file is turned into a structured presentation table. Titles, author names, institutions, and presentation details are then organized.
In the next stage, the software collects candidate publications from OpenAlex, PubMed, and CrossRef. These publications are then unified using criteria such as DOI, PMID, and title similarity, and duplicate records are cleaned out. In the final stage, a multi-step matching process runs between the congress abstracts and the publications.
This process does not look at title similarity alone. It evaluates author overlap, publication year, strong identifiers such as DOI and PMID, and the time window — all together. In this way, each match is classified within a chain of evidence.
Scientific Value and Future Applications
Sci-Stalker offers a reusable framework that can be applied to measure the publication performance of congress abstracts across different medical specialties. The TOTDER project is the first comprehensive application of this system; the same method can also be adapted for other associations, specialties, and congress series.
With this software, academic institutions, specialty associations, and researchers can begin to seek more systematic answers to questions such as:
- How many of the abstracts presented at a given congress are converted into journal articles?
- In which years does the publication-conversion rate rise or fall?
- Which types of studies are more often converted into publications?
- Which author groups or institutions show stronger sustained publication output?
- How much do congress presentations contribute to the scientific literature?
These questions matter not only as a matter of academic curiosity, but for evaluating the quality, sustainability, and visibility of scientific output.
Validation and Preparation for Academic Publication
Sci-Stalker outputs are designed to be verifiable by independent human reviewers. Metrics such as precision, recall, and F1 can be computed for the matching layers. Cohen's kappa can also be used to measure inter-rater agreement.
For the TOTDER project, a manually labeled gold-standard evaluation set of 150–300 presentations is planned. This set will be used to measure the software's accuracy and to demonstrate its methodological reliability before academic publication.
Team and Contributions
Sci-Stalker was developed under the leadership of Dr. Emre Gecer, who oversaw the software's architecture, pipeline design, and project management. Ecrin Alihoca contributed to the OpenAlex, PubMed, and CrossRef fetch engines and to the merge, match, and translation components. Gökalp Çetin worked on the author canonicalization, mapping, and normalization modules.
This team structure is what allowed Sci-Stalker to become more than an idea — to grow into a working research infrastructure that spans data extraction, data cleaning, scientific source crawling, matching, and validation.
Conclusion
Sci-Stalker is a next-generation research software that follows the trace, in the scientific literature, of studies presented at medical congresses. It does not view congress abstracts merely as archived texts; it makes their scientific journey over the years traceable.
In doing so, Sci-Stalker reveals the unseen side of academic output. It makes it possible to evaluate, on evidence-based grounds, which studies have been turned into journal articles, which have left a mark in the literature, and which have remained in the conference proceedings. For researchers, specialty associations, and academic institutions that want to analyze the publication performance of congress presentations in medicine, it offers a strong starting point.
Dr. Emre Gecer
Author
İlgilendiğim bazı şeyler var. Sinema kuramı, senaryo mekaniği, sanat akımları, jazz müzik, finans teorisi, python, yapay zeka, makine öğrenmesi ve tıpın ilgimi çeken konuları gibi. Bunlar hakkında not düşebileceğim, düşüncelerimi paylaşabileceğim bir alan yaratmak istedim. Birazda hayatın içinden anlar, hikayeler eklerim diye düşünüyorum. Buranın zamanla gelişeceğine inanıyorum, belki de uzun vadede bambaşka bir şeye dönüşür. Neden olmasın?
Related Articles
Computer Science and Cryptography: Foundations of Digital Security
How has cryptography, the foundation of digital security, evolved? A comprehensive review of cryptography in computer science — from the Caesar cipher to quantum cryptography, from symmetric and asymmetric encryption algorithms to the TLS protocol, and from hash functions to post-quantum cryptography.
KodlamaCybersecurity: A Comprehensive Guide
A comprehensive guide to cybersecurity, from fundamentals to advanced topics. The CIA triad, ransomware, APTs, zero-day vulnerabilities, the OWASP Top 10, cloud security, IoT, SOC operations, penetration testing, bug bounty programs, the MITRE ATT&CK framework, and the USOM/BTK structure in Turkey.
KodlamaThe Cutting Edge and Risky Face of the Digital Economy: Crypto, Fintech and Beyond
From cryptocurrencies to DeFi, from the NFT bubble to the digital Turkish Lira, from the fintech revolution to crypto scams — explore the opportunities and risks of the digital economy. Understand Turkey's regulatory landscape through the SPK, BDDK and MASAK framework.