Informs Journal on Data Science - IJDS

Informs Journal on Data Science - IJDS

The INFORMS Journal on Data Science (IJDS) is a peer-reviewed journal, aiming to publish top innovat

Conjecturing-Based Discovery of Patterns in Data 05/02/2024

💡New Article Alert in Informs Journal on Data Science - IJDS

"Conjecturing-Based Discovery of Patterns in Data" by J. Paul Brooks, David J. Edwards, Craig E. Larson, and Nico Van Cleemput

🔗 Link to Article: https://doi.org/10.1287/ijds.2021.0043
🔗 Link to Presentation Video: https://www.youtube.com/watch?v=DfqJZ4rtxbQ

📝 Summary: This work leverages a computational conjecturing framework to produce nonlinear bounds for continuous features and boolean expressions for categorical features based on input data. Their method recovers known patterns in data that no previous method could find.

Conjecturing-Based Discovery of Patterns in Data We propose the use of a conjecturing machine that suggests feature relationships in the form of bounds involving nonlinear terms for numerical features and Boolean expressions for categorical featu...

29/01/2024

We are thrilled to welcome our new associate editors Associate Professor Jing Wang and Professor Wenjun Zhou

20/01/2024

💡New Article Alert in Informs Journal on Data Science - IJDS

"Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms" by Siong Thye Goh, Lesia Semenova, Cynthia Rudin

🔗 Link to Article: https://doi.org/10.1287/ijds.2021.0001
🖥 Link to Code: https://codeocean.com/capsule/2414499/tree/v1

📝 Summary: The authors introduce three tree-based density estimation methods for categorical data. Its models are sparse, and users can specify the desired number of leaves, branches, or rules with a prior.

🙋‍♀️What is the most important finding in this work?
This work introduces high-dimensional analogs to the histogram. These are sparse piecewise constant density estimators for binary/categorical data. The Bayesian priors encourage sparsity, allowing for interpretability and the models are 50 times sparser than high-dimensional histograms on crime data that describe how often different types of break-ins occur.

🔍 What is the impact of the research to the community?
The three methods produce sparse density estimation models that can be printed on an index card and yet provide insight into real datasets that could not have been reliably obtained in any other way. Visualizing the estimated density values can aid in understanding the data distribution and assist with decision-making.

Cost Patterns of Multiple Chronic Conditions: A Novel Modeling Approach Using a Condition Hierarchy 02/01/2024

💡Celebrate the arrival of New Year 2024 by delving into the article published in Informs Journal on Data Science

"Cost Patterns of Multiple Chronic Conditions: A Novel Modeling Approach Using a Condition Hierarchy" by Lida Apergi, Margret Bjarnadottir, John Baras, and Bruce L. Golden

🔗 Link to Article: https://doi.org/10.1287/ijds.2022.0010

"This study introduces a unique modeling approach, drawing inspiration from backward elimination and incorporating a cost hierarchy to minimize information loss. The cost of each condition is modeled as a function of the number of other, more expensive chronic conditions an individual has. By applying this method to extensive claims data from 2007 to 2012, the research identifies individuals with one or more chronic conditions, estimates their total 2012 healthcare expenditures, and employs regression analysis and clustering to characterize the cost patterns of 69 chronic conditions. The hierarchical model adeptly captures intricate interactions, offering potential enhancements in decision-making, particularly in situations where enumerating all possible factor combinations is impractical, such as in financial risk scoring and pay structure design."

Cost Patterns of Multiple Chronic Conditions: A Novel Modeling Approach Using a Condition Hierarchy Healthcare cost predictions are widely used throughout the healthcare system. However, predicting these costs is complex because of both uncertainty and the complex interactions of multiple chronic...

23/12/2023

💡 Explore the new article at IJDS during your holiday break!

"Interpretable Hierarchical Deep Learning Model for Noninvasive Alzheimer's Disease Diagnosis" by Maryam Zokaeinikoo, Pooyan Kazemian, and Prasenjit Mitra

🔗 Link to Article: https://doi.org/10.1287/ijds.2020.0005
🖥️ Access the code at: https://codeocean.com/capsule/2881658/tree/v1

"This study introduces an interpretable hierarchical deep learning model for the noninvasive and affordable detection of Alzheimer's disease. The model utilizes transcripts of patient interviews, employing a novel hierarchical attention mechanism to capture temporal dependencies in longitudinal data. Results show a 96% accuracy in detecting Alzheimer's from interviews, offering interpretability through importance scores for words, sentences, and transcripts. This approach could enhance diagnosis, improve patient outcomes, and contribute to cost containment, providing a promising alternative to expensive and invasive imaging methods."

pubsonline.informs.org

Diversity Subsampling: Custom Subsamples from Large Data Sets 21/12/2023

💡 Explore the latest article at IJDS during your holiday break!

"Diversity Subsampling: Custom Subsamples from Large Data Sets" by Boyang Shang, Daniel W. Apley, and Sanjay Mehrotra

🔗 Link to Article: https://doi.org/10.1287/ijds.2022.00017

"This paper proposes a novel diversity subsampling algorithm suitable for large real-world data. It enjoys superior performance and is far faster than existing algorithms. It is more efficient and effective to select a diverse subsample from a large data set by requiring the subsample to approximate a uniform sample over the effective support of the data, relative to maximizing the minimum distance between points. A diverse subsample is beneficial in supervised learning settings to address covariate drift, find the global optimum of the response surface, etc."

🖥️ Access the code at: https://doi.org/10.24433/CO.8309237.v3

Diversity Subsampling: Custom Subsamples from Large Data Sets Subsampling from a large unlabeled (i.e., no response values are available yet) data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction...

Credit Risk Modeling with Graph Machine Learning 04/12/2023

💡 Check out the new article at IJDS!

"Credit Risk Modeling with Graph Machine Learning" by Sanjiv Das, Xin Huang, Soji Adeshina, Patrick Yang, and Leonardo Bachega

Link to Article: https://doi.org/10.1287/ijds.2022.00018

"Credit ratings are traditionally generated from models that use financial statement data and market data, which are tabular (numeric and categorical). Using machine learning methods, this work constructs a network of firms using U.S. Securities and Exchange Commission (SEC) filings (denoted CorpNet) to enhance the traditional tabular data set with a corporate graph. This paper demonstrates that a corporate graph generated using text from SEC filings, used to fit graph neural networks (GNNs), performs better than traditional models based on tabular data alone. Constructing a network of corporate linkages is often challenging, and the paper shows how to do this with large-scale text processing. The community can use this approach to manufacture graphs for other applications as well. Additionally, this paper suggests that practitioners may want to use GNNs to improve existing credit rating models."

The code is available at https://codeocean.com/capsule/5230264/tree/v2

Credit Risk Modeling with Graph Machine Learning Accurate credit ratings are an essential ingredient in the decision-making process for investors, rating agencies, bond portfolio managers, bankers, and policy makers, as well as an important input...

Modeling Financial Products and Their Supply Chains 28/11/2023

💡 Check out the new article at IJDS!

"Modeling Financial Products and Their Supply Chains" by Margrét Vilborg Bjarnadóttir and Louiqa Raschid

Link to Article: https://doi.org/10.1287/ijds.2020.0006

"The paper focuses on residential mortgage-backed securities, which were at the heart of the 2008 US financial crisis. We model and study how multiple financial institutions form a supply chain to create these securities. We show that communities of financial institutions along the supply chain are associated with the generation of a prospectus and a group of securities. We are the first to show that toxic communities that are closely linked to financial institutions that played a key role in the subprime crisis can increase the risk of failure of the securities."

The code is available at https://codeocean.com/capsule/7485173/tree/v1

Modeling Financial Products and Their Supply Chains The objective of this paper is to explore how novel financial datasets and machine learning methods can be applied to model and understand financial products. We focus on residential mortgage backe...

15/11/2023

Thrilled to share that our Editor-in-Chief, INFORMS Journal on Data Science Professor Galit Shmueli, has been honored as a 2023 INFORMS Information Systems Society (ISS) Distinguished Fellow. This prestigious award is a testament to her exceptional intellectual contributions in the information systems discipline.

Warmest congratulations to Professor Galit Shmueli for this well-deserved honor! 🌐✨ Link: https://lnkd.in/eWhTaVq4

28/10/2023

We are thrilled to welcome our new Associate Editors - Associate Professor Mochen Yang and Associate Professor Yongxiang Li

19/10/2023

Congratulations to the Journal of Data Science 2023 Best Associate Editor Awardees!

19/10/2023

Congratulations Enric Junque de Fortune for receiving the INFORMS Journal of Data Science 2023 Meritorious Service Award

19/10/2023

Congratulations, Professor Kwok-Leung Tsui, for receiving the INFORMS Fellow Award! For more information visit: https://www.informs.org/News-Room/INFORMS-Releases/Awards-Releases/INFORMS-Names-2023-Fellows

https://connect.informs.org/discussion/congratulations-to-the-new-informs-fellows-class-of-2023

An Optimization-Based Order-and-Cut Approach for Fair Clustering of Data Sets 17/10/2023

💡 Check out the new article at IJDS!

"An Optimization-Based Order-and-Cut Approach for Fair Clustering of Data Sets" by Su Li, Hrayer Aprahamian, Maher Nouiehed, and Hadi El-Amine.

Link to Article: https://pubsonline.informs.org/doi/10.1287/ijds.2022.0005

"In this paper, we consider the problem of fair clustering of datasets. In particular, given a set of items each associated with a vector of non-sensitive attribute values and a categorical sensitive attribute (e.g., gender, race, etc.), our goal is to find a clustering of the items that minimizes the loss (i.e., clustering objective) function while imposing fairness measured by Rényi correlation."

The code is available at https://codeocean.com/capsule/3010305/tree/v1

An Optimization-Based Order-and-Cut Approach for Fair Clustering of Data Sets Machine learning algorithms have been increasingly integrated into applications that significantly affect human lives. This surged an interest in designing algorithms that train machine learning mo...

Registration-Free Localization of Defects in Three-Dimensional Parts from Mesh Metrology Data Using Functional Maps 28/09/2023

💡 Check out the new article at IJDS!

"Registration-Free Localization of Defects in Three-Dimensional Parts from Mesh Metrology Data Using Functional Maps" by Xueqi Zhao and Enrique del Castillo.

Link to Article: https://pubsonline.informs.org/doi/10.1287/ijds.2023.0030

"In this paper, the authors propose a novel registration-free solution to the post-Statistical Process Control part defect localization problem. The approach leverages a spectral decomposition of the Laplace–Beltrami operator to create a functional map between CAD and measured manifolds, to locate defects on the suspected part. A computational complexity analysis demonstrates that this approach scales more efficiently with mesh size and is more stable compared to registration-based approach"

The code is available at https://codeocean.com/capsule/4615101/tree/v1

Registration-Free Localization of Defects in Three-Dimensional Parts from Mesh Metrology Data Using Functional Maps We consider a common problem occurring after using a statistical process control (SPC) method based on three-dimensional measurements: locate where on the surface of the part that triggered an out-...

17/08/2023

💡 Check out the new article at IJDS!

"A Supervised Tensor Dimension Reduction-Based Prognostic Model for Applications with Incomplete Imaging Data" by Chengyu Zhou and Xiaolei Fang

Link to Article: https://pubsonline.informs.org/doi/abs/10.1287/ijds.2022.x022

"In this work, a supervised dimension reduction model was proposed for the prognostics of applications with incomplete imaging data. Analytic solutions for parameter estimation were discussed. The proposed supervised dimension reduction-based prognostic model outperforms unsupervised dimension reduction-based models."

The code is available at https://github.com/czhou9/Code-and-Data-for-IJDS

13/08/2023

Professor Bianca Maria Colosimo commenting on industry and academy collaborations

https://youtu.be/ubL13FG9TKs?list=PLJhO6J3qy8O67CWNWLqI44DQrcORXaXey

13/08/2023

Professor Bianca Maria Colosimo on what she looks during the review process?

https://youtu.be/RbtC9AbK9oo?list=PLJhO6J3qy8O67CWNWLqI44DQrcORXaXey

Bianca on what she looks during review process 13/08/2023

Professor Bianca Maria Colosimo on what she looks during the review process?

Bianca on what she looks during review process

Bianca on what things she looks before sending it for review 13/08/2023

What are the three main things you look for in the paper before reviewing it? Professor Bianca Maria Colosimo, Senior Editor, Informs Journal on Data Science - IJDS

Bianca on what things she looks before sending it for review

13/08/2023

How is Informs Journal on Data Science creating its own space? Insights from our Senior Editor Prof Bianca Maria Colosimo

https://youtu.be/z8BWxLTmk7o?list=PLJhO6J3qy8O67CWNWLqI44DQrcORXaXey

Bianca on articles relevant to IJDS 13/08/2023

What are the articles you think are relevant to send to IJDS?

Bianca on articles relevant to IJDS

Introduction to Bianca 13/08/2023

Introduction to Professor Bianca Maria Colosimo, Senior Editor, International Journal on Data Informs Journal on Data Science - IJDS

Introduction to Bianca Introduction to Bianca Maria Colosimo, SE @ IJDS. She is co-leading the research laboratory AddMe Lab, one of the leading lab on Additive Manufacturing in Eu...

Current Issue | INFORMS Journal on Data Science 13/08/2023

New issue alert ⚠️
The new issue of INFORMS Journal on Data Science is now available online. 📚 Check out the latest research and insights in the world of .

Link:

Current Issue | INFORMS Journal on Data Science

A Nonparametric Subspace Analysis Approach with Application to Anomaly Detection Ensembles 05/06/2023

💡Check out the new article at IJDS!

"Multiblock Parameter Calibration in Computer Models" by Cheoljoon Jeong, Ziang Xu, Albert S. Berahas, Eunshin Byon, Kristen Cetin

Link to Article: https://pubsonline.informs.org/doi/abs/10.1287/ijds.2023.0029

"This study focuses on the calibration of parameters in computer models, aiming to estimate unobservable parameters using physical process responses and computer model outputs. While previous studies commonly calibrate all parameters simultaneously using a complete data set, this study addresses the importance of calibrating parameters that are associated with specific subsets of data. To tackle this issue, the study introduces a new approach known as multiblock calibration, which involves minimizing multiple loss functions. Each loss function corresponds to a block of parameters that utilizes the corresponding data set, and the parameters are estimated using a nonlinear optimization technique. The study presents the convergence properties under specific conditions and quantifies the uncertainties in parameter estimation. Numerical studies and a real-world case study on building energy simulation demonstrate the superiority of this approach."

The code capsule is available on Code Ocean at https://codeocean.com/capsule/5557786/tree.

A Nonparametric Subspace Analysis Approach with Application to Anomaly Detection Ensembles Identifying anomalies in multidimensional data sets is an important yet challenging task in many real-world applications. A special case arises when anomalies are occluded in a small subset of attr...

A Nonparametric Subspace Analysis Approach with Application to Anomaly Detection Ensembles 31/05/2023

New article alert! Check out the new article at IJDS!

"A Nonparametric Subspace Analysis Approach with Application to Anomaly Detection Ensembles" by Irad Ben-Gal, Marcelo Bacher, Morris Amara, Erez Shmueli

Link to Article: https://pubsonline.informs.org/doi/10.1287/ijds.2023.0027

"We introduce a novel subspace analysis approach called agglomerative attribute grouping (AAG). AAG identifies highly correlative attribute subspaces using a generalized multiattribute measure based on information theory. This approach improves anomaly detection, novelty detection, forecasting, and clustering tasks. AAG outperforms classical and state-of-the-art subspace analysis methods in most cases, generating fewer subspaces with fewer attributes and resulting in faster training times. The generated subspaces have broad applicability across various analytical tasks."

A Nonparametric Subspace Analysis Approach with Application to Anomaly Detection Ensembles Identifying anomalies in multidimensional data sets is an important yet challenging task in many real-world applications. A special case arises when anomalies are occluded in a small subset of attr...

A Robust Approach to Quantifying Uncertainty in Matching Problems of Causal Inference 26/05/2023

New article alert! Check out the new article at IJDS!

"Adaptive Exploration and Optimization of Materials Crystal Structures" by Arvind Krishna, Huan Tran, Chaofan Huang, Rampi Ramprasad, V. Roshan Joseph.

Link to Article: https://pubsonline.informs.org/doi/epdf/10.1287/ijds.2023.0028

In this study, the author proposed an expansion-exploration-exploitation framework, extending the traditionally used exploration-exploitation Bayesian optimization framework, to find the global minimum of a complex potential energy surface. The proposed approach shows promise in efficiently exploring the potential energy surface and identifying the most stable crystal structure, shedding light on materials science optimization problems.

The code can be found at https://codeocean.com/capsule/3366149/tree

A Robust Approach to Quantifying Uncertainty in Matching Problems of Causal Inference Unquantified sources of uncertainty in observational causal analyses can break the integrity of the results. One would never want another analyst to repeat a calculation with the same data set, usi...

Sequential Adversarial Anomaly Detection for One-Class Event Data 25/05/2023

Check out the new article at IJDS!

"Sequential Adversarial Anomaly Detection for One-Class Event Data" by Shixiang Zhu, Henry Shaowu Yuchi, Minghe Zhang, Yao Xie

Link to Article: https://pubsonline.informs.org/doi/10.1287/ijds.2023.0026

"This work considers the sequential anomaly detection problem in the one-class setting when only the anomalous sequences are available and propose an adversarial sequential detector by solving a minimax problem to find an optimal detector against the worst-case sequences from a generator. The generator captures the dependence in sequential events using the marked point process model. The detector sequentially evaluates the likelihood of a test sequence and compares it with a time-varying threshold, also learned from data through the minimax problem. The authors believe the proposed framework is a natural way to tackle the one-class anomaly detection problem, leveraging adversarial learning advances. It may provide a first step towards bridging imitation learning and sequential anomaly detection."

The presentation slide is available at https://drive.google.com/file/d/1_wdVACf8F8VOSrwpkrOaU4VfEgiy0eNZ/view

The code is available at https://github.com/meowoodie/Fourier-Point-Process-via-Imitation-Learning

Sequential Adversarial Anomaly Detection for One-Class Event Data We consider the sequential anomaly detection problem in the one-class setting when only the anomalous sequences are available and propose an adversarial sequential detector by solving a minimax pro...

Rhetoric Mining: A New Text-Analytics Approach for Quantifying Persuasion 18/05/2023

Check out the new article at IJDS!

"Rhetoric Mining: A New Text-Analytics Approach for Quantifying Persuasion" by Michelle M. H. Şeref, Onur Şeref, Alan S. Abrahams, Shawndra B. Hill, Quinn Warnick

Link to Article: https://pubsonline.informs.org/doi/abs/10.1287/ijds.2022.0024

"This work proposes rhetoric mining as a text-analytics method that quantifies persuasion. It utilizes a mixed-methodology approach, combining qualitative context analysis with automated tagging and quantification of rhetorical moves. By applying a sequence-based text-mining approach, it detects and analyzes complex discursive patterns. The method is illustrated through the analysis of arguments in an online investment community, identifying rhetorical moves such as ethos, hedging, and evidence type. This work employs rhetoric mining to identify persuasive argument styles and trustworthiness, offering a new analytical lens for studying persuasion in consumer decision-making within Information Systems research."

The code capsule is available at https://codeocean.com/capsule/9373643/tree/v1

Rhetoric Mining: A New Text-Analytics Approach for Quantifying Persuasion Rhetoric mining is a novel text-analytics method for quantifying persuasion based on rhetorical analysis theory. Our mixed-methodology approach combines qualitative context analysis with automated ...

09/03/2023

All IJDS articles are free access, supporting the dissemination of scientific knowledge. If you enjoy holding and reading the print issue, you get an individual or institution subscription https://pubsonline.informs.org/page/ijds/prices-and-ordering

Videos (show all)

The first issue of the INFORMS Journal on Data Science is now available online. Check it out now via the link https://pu...