Data Science Central
Co-founded by Vincent Granville and part of the DSC community, our focus is on data science, ML, AI,
30 Python Libraries that I Often Use https://mltblog.com/3ONhMWi
This list covers well-known as well as specialized libraries that I use rather frequently. Applications include GenAI, data animations, LLM, synthetic data generation and evaluation, ML optimization, scientific computing, statistics, web crawling, APIs, SQL, and more. I also mention my owns, and issues that I faced with standard libraries. In several instances, for instance sound generation, I did not use any library. In addition, included some functions that I regularly call. Many times, I explain why I had to create my home-made versions.
30 Python Libraries that I Often Use - DataScienceCentral.com 30 Python libraries to solve most AI problems, including GenAI, data videos, synthetization, model evaluation, computer vision and more.
Gemini Ultra Unleashed: Google's Best LLM Now Available https://mltblog.com/3SBZzMz
A lot has changed for the better since the first announcement not long ago.
Hands-on workshop for developers and AI professionals, on state-of-the-art technology. Live demo and code-sharing session to see Gemini Ultra in action. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.
Probabilistic ANN: The Swiss Army Knife of GenAI https://mltblog.com/48hQWfY
ANN — Approximate Nearest Neighbors — is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time series, and so on), tabular data synthetization (improving poor synthetizations), model evaluation, […]
Probabilistic ANN: The Swiss Army Knife of GenAI - Machine Learning Techniques ANN -- Approximate Nearest Neighbors -- is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time se...
Actions in GPTs: Developer Tips, Tricks & Techniques https://mltblog.com/3utzlDZ
Hands-on workshop for developers and AI professionals, on state-of-the-art technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.
How to Automate Data Cleaning, in a Nutshell
How to Automate Data Cleaning, in a Nutshell - DataScienceCentral.com Issues and solutions to automate data cleaning. Free your data scientists from the most boring tasks, making them happier and reducing costs.
Massively Speed-Up your Learning Algorithm, with Stochastic Thinning. Includes use case, Python code, regression and neural network illustrations.
Massively Speed-Up your Learning Algorithm, with Stochastic Thinning - Machine Learning Techniques Dramatically Speed-Up your Learning Algorithm, with Stochastic Thinning. Includes use case, Python code, regression and neural network illustrations.
More Fun Math Problems for Machine Learning Practitioners
More Fun Math Problems for Machine Learning Practitioners - DataScienceCentral.com This is part of a series featuring the following aspects of machine learning: Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science) Opinions, for instance about the value of a PhD in our field, or the use of some techniques Methods, principle...
Better, Faster, Less Expensive Synthetic Data Without Deep Learning
NoGAN: Ultrafast Data Synthesizer – My Talk at ODSC San Francisco - Machine Learning Techniques My talk at the ODSC Conference, San Francisco, October 2023. Includes Notebook demonstration, using our open-source Python libraries. View or download the PowerPoint presentation, here. I discuss NoGAN, an alternative to standard tabular data synthetization. It runs 1000x faster than GAN, consistent...
AI-based Object/Image Detection for Inventory Management https://mltblog.com/3SMRJRC
Hands-on workshop for developers and AI professionals, on state-of-the-art technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.
This is one of the AI applications where many compagnies recognize the value and are ready to invest, with guaranteed return thanks to low costs, proven technology, and automation.
Many of the requests we get from potential enterprise clients - even brick and mortar companies - are actually focused on this topic: automated classification and management of inventory or digital content, with an interest in automated image labeling and classification, as well as creating document taxonomies and better search tools (sometimes with automated data analysis) to help internal customers quickly find what they need.
NoGAN: Ultrafast Data Synthesizer and New Evaluation Metric - My Presentation at ODSC San Francisco
GenAI Breakthrough Fast, High Quality Tabular Data Synthetization Our presentation/workshop about NoGAN at ODSC San Francisco, October 2023. Runs 1000x faster than GAN, consistently delivering better results according to th...
The Riemann Hypothesis in One Picture
The Riemann Hypothesis in One Picture - DataScienceCentral.com With visual, simple, intuitive method for supervised classification
Simple Introduction to Public-Key Cryptography and Cryptanalysis: Illustration with Random Permutations
Simple Introduction to Public-Key Cryptography and Cryptanalysis: Illustration with Random Permutations - DataScienceCentral.com In this article, I illustrate the concept of asymmetric key with a simple example. Rather than discussing algorithms such as RSA, (still widely used, for instance to set up a secure website) I focus on a system easier to understand, based on random permutations. I discuss how to generate these rando...
GenAI: Fast Vector Search at Scale (Demo on AWS)
Register at https://mltblog.com/3UGF0l5.
ANN stands for Approximate Nearest Neighbors, a faster yet high-quality alternative to exact but slow KNN, for vector search in GenAI contexts (LLM, GPT, multimodal, and so on). My team is actually developing proprietary technology on this topic, with paper coming soon. In the meanwhile, if you want to see real enterprise case studies, and an existing fully scaled algorithm in action, this hands-on workshop is for you.
Intended to developers and AI professionals, featuring state-of-the-art GenAI technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.
Synthetizing the Insurance Dataset Using Copulas - Towards Better Synthetization
Synthetizing the Insurance Dataset Using Copulas: Towards Better Synthetization - Machine Learning Techniques This article is an extract from my book “Synthetic Data and Generative AI”, available here. In the context of synthetic data generation, I've been asked a few times to provide a case study focusing on real-life tabular data used in the finance or health industry. Here we go: this article fills t...
A Simple Regression Problem
A Simple Regression Problem - DataScienceCentral.com This article is part of a new series featuring problems with solution, to help you hone your machine learning and pattern recognition skills. Try to solve this problem by yourself first, before looking at the solution. Today’s problem also has an intriguing mathematical appeal and solution: this a...
Generative AI: Synthetic Data Vendor Comparison and Benchmarking Best Practices
Generative AI: Synthetic Data Vendor Comparison and Benchmarking Best Practices - Machine Learning Techniques The goal of data synthetization is to produce artificial data that mimics the patterns and features present in existing, real data. Many generation methods and evaluation techniques are available, depending on purposes, the type of data, and the application field. Everyone is familiar with synthetic...
Book: Intuitive Machine Learning and Explainable AI
New Book: Intuitive Machine Learning - DataScienceCentral.com Intuitive Machine Learning with focus on explainable AI, human-friendly intelligence, powerful visualizations and applications.
Machine Learning Cloud Regression: The Swiss Army Knife of Optimization
Machine Learning Cloud Regression: The Swiss Army Knife of Optimization - Machine Learning Techniques Entitled “Machine Learning Cloud Regression: The Swiss Army Knife of Optimization”, the full version in PDF format is accessible in the “Free Books and Articles” section, here. Also discussed in details with Python code in chapter 1 in my book “Intuitive Machine Learning and Explainable AI...
Better LLMs with Shorter Embeddings: Part 3 https://mltblog.com/3HGj6Xi
Variable Length Embeddings and fast ANN-like search (approximated nearest neighbors) for better, lighter and less expensive LLMs
Better LLMs with Shorter Embeddings: Part 3 - DataScienceCentral.com Variable Length Embeddings and fast ANN-like search (approximated nearest neighbors) for better, lighter and less expensive LLMs
18 Differences Between Good and Great Data Scientists
18 Differences Between Good and Great Data Scientists - DataScienceCentral.com machine learning, data science career, business analytics, data science lifecycle, data visualizations
How to Choose the Best Machine Learning Technique: Comparison Table
How to Choose the Best Machine Learning Technique: Comparison Table - DataScienceCentral.com
Creating Embeddings on Large, Real-Time Data with OpenAI https://mltblog.com/3SiMGXF
Hands-on workshop for developers and AI professionals, on state-of-the-art GenAI technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.
I recently showed how to optimize embeddings and RAG architecture in LLMs and GPT-like applications, with home-made systems. This webinar discusses a real business case, with much larger input data in real time, using efficient tools. Embeddings is the central piece.
New Python Library to Evaluate AI-generated Data and Compare Models
New Python Library to Evaluate AI-generated Data and Compare Models - Machine Learning Techniques Called GenAI-Evalution, you use it for instance to assess the quality of tabular synthetic data. In this case, it measures how faithfully the synthetization mimics the real data it is derived from, by comparing the full joint empirical distributions (ECDF) attached to the two datasets. It works both...
A Synthetic Stock Exchange Played with Real Money. Includes Python code dealing with gigantic numbers using exact arithmetic.
A Synthetic Stock Exchange Played with Real Money - Machine Learning Techniques Not only that, but you can predict -- more precisely compute with absolute certainty -- what the value of any stock will be tomorrow. Transaction fees are well below 0.05% and the market, at least in the version presented here, is fair: in other words, a zero-sum game if you play by luck. If instead
Python Code and Material from the Book "Stochastic Processes and Simulations" - GitHub Repository
GitHub - VincentGranville/Point-Processes: This repository contains the material (datasets, code, videos, spreadsheets) related to my book Stochastic Processes and Simulations - A Machine Learning Perspective. This repository contains the material (datasets, code, videos, spreadsheets) related to my book Stochastic Processes and Simulations - A Machine Learning Perspective. - GitHub - VincentGranville/Po...
An Intriguing Job Interview Question for AI/ML Professionals
An Intriguing Job Interview Question for AI/ML Professionals - DataScienceCentral.com Intriguing technical job interview questions for candidates applying to machine learning and AI jobs, with 4 difficulty levels.
Book: Interpretable Machine Learning
New Book: Intuitive Machine Learning and Explainable AI - Machine Learning Techniques Intuitive Machine Learning with focus on explainable AI, human-friendly intelligence, powerful visualizations and applications. By Vincent Granville Ph.D, published in September 2022. PDF format, 156 pages. Version 1.0 with Python code. The book is available here. For my upcoming course based on thi...
Build Document/Image Analytics with GPT-4 Vision https://mltblog.com/48Odh69
Showcasing a conceptual application demo that can analyze insurance claims data, interpret PDF documents and photos of car accidents to infer damage types and estimate payouts.
Hands-on workshop for developers and AI professionals, on state-of-the-art GenAI technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.
New GenAI Evaluation Metric, Ultrafast Search, and Perfect Randomness
New GenAI Evaluation Metric, Ultrafast Search, and Perfect Randomness - Machine Learning Techniques This article covers three different GenAI topics. First, I introduce one of the best random number generators (PRNG) with infinite period. Then I show how to evaluate the synthesized numbers using the full multivariate empirical distribution (same as KS that I used for NoGAN evaluation), but this ti...
My Book on Poisson-binomial Stochastic Processes and Simulations
Stochastic Processes, 2nd Edition, now with Python Code - Machine Learning Techniques The book covers supervised classification, including fractal classification, as well as unsupervised clustering, using an innovative approach. Datasets are first mapped onto an image, then processed using image filtering techniques. I discuss the analogy with neural networks, comparing very deep but...