Andrejus Baranovski

Subscribe to Andrejus Baranovski feed
Blog about Oracle, Full Stack, Machine Learning and Cloud
Updated: 10 hours 24 min ago

Data Annotation with SVG and JavaScript

Mon, 2022-05-16 01:35
I explain how to build a simple data annotation tool with SVG and JavaScript in HTML page. The sample code renders two boxes in SVG on top of the receipt image. You will learn how to select and switch between annotation boxes. Enjoy!


PyScript - Deep Dive for Developer

Mon, 2022-05-09 01:52
PyScript was announced last week at PyCon US 2022. Good news for all Python developers, now we can run Python logic serverless in the browser. This video is deep dive, with a step-by-step explanation of the sample application code. It includes an input component, chart, and table. I explain how to update UI when the input component changes. Hope this will be useful for your practical knowledge.


PyScript - Python in the Browser

Mon, 2022-05-02 03:15
Exciting times! On PyCon US 2022 it was announced about PyScript. With PyScript framework, we can run regular Python code directly in the browser, included in py-script tag. This opens lots of new possibilities for serverless Python applications, using the same API and libraries you used to code on the server-side. Think about the browser as VM that runs your code.


UI for ML - Django, React or Streamlit?

Tue, 2022-04-26 11:29
UI is an important part for ML app to be successful. In this video I discuss multiple UI options I was looking into to build UI for our ML product. While deciding on which UI framework or library to use, you should point your attention to multiple things - such as ease of data transfer, UI flexibility, and ability to build user-friendly functionality.


Mindee docTR - Probably the Best Open-Source OCR

Mon, 2022-04-18 09:16
Do you want to build ML pipeline to automate data extraction from business documents (receipts, invoices, forms)? Then your first step should be to integrate OCR for text extraction. OCR extraction quality must be good, the whole pipeline will depend on initial text data extraction quality. If extracted data will be accurate, this means ML models will be able to run proper classification. I spent time researching available solutions for OCR and I think Mindee docTR currently is one of the best open-source OCR solutions available. Check the video, where I run and show multiple tests.


Document Information Extraction Demo on Hugging Face Spaces

Mon, 2022-04-11 14:41
This video shows how fine-tuned LayoutLMv2 document understanding and information extraction model runs on Hugging Face Spaces demo environment. I show how data extraction works for different receipts and why you should not rely on OCR which comes pre-configured together with LayoutLMv2 model.


Hugging Face LayoutLMv2 Model True Inference

Sun, 2022-03-27 14:33
I explain why OCR quality matters for Hugging Face LayoutLMv2 model performance, related to document data classification. If input from OCR is poor, ML classification inference results will be low quality too. This is why it is important to use high quality OCR system to extract text and coordinates from the document, before applying ML solution.


Get Receipt Data with Hugging Face ML Model

Sun, 2022-03-20 10:29
This tutorial is about how to use fine-tuned Hugging Face model to extract data from scanned receipt documents. We are executing inference action - passing receipt image, along with words and coordinates to the model. As a result, we get back predictions - class labels assigned to each input. This helps to classify document elements and extract correct data. I share a hint on how to match input words with classified labels. Input words and coordinates are expected to be retrieved from separate OCR.


Fine-Tuning with Hugging Face Trainer

Sun, 2022-03-13 16:37
In this tutorial, I explain how I was using Hugging Face Trainer with PyTorch to fine-tune LayoutLMv2 model for data extraction from the documents (based on CORD dataset with receipts). The advantage of Hugging Face Trainer - it simplifies model fine-tuning pipeline and you can easily upload the model to Hugging Face model hub.


Hugging Face Datasets - Example with Receipts Data

Sun, 2022-03-06 13:39
Hugging Face Datasets library provides a useful API to work with data for ML model fine tuning. It allows you to load and process any external datasets with your own Python functions. As a result, you will get a unified data interface and could reuse the same API for fine-tuning various Hugging Face models.


How To Evaluate Hugging Face Saved Model

Sun, 2022-02-20 15:15
You fine-tuned Hugging Face model on Colab GPU and want to evaluate it locally? I explain how to avoid the mistake with labels mapping array. The same labels mapping you used to fine-tune the model, should be used when evaluating (or doing inference) this model on the local environment (or in another Colab session).

Development Workflow with Hugging Face Transformer Model

Sun, 2022-02-13 14:02
This tutorial explains how I do app development with Hugging Face Transformer model. Typically the flow involves model fine-tuning on Colab GPU. Fine-tuned model is downloaded to my local development workstation where I continue development and use the model for inference task. To be able to run complex library dependencies locally, my development environment is setup with a remote Python interpreter through PyCharm and Docker.


What is Blockchain?

Sun, 2022-02-06 13:45
There is a lot of buzz around blockchain, Web3 and crypto. When studying blockchain concepts I found this demo app, it was very useful to me. This app demonstrates how blockchain works. I thought to share it with you, it really helps to understand the foundation and concepts.


Ethereum Test ETH for Web3/Blockchain Development

Tue, 2022-02-01 08:59
This is my first tutorial about Web3 and Ethereum Blockchain. I explain how to get test Ethereum token, review your wallet on Metamask and transfer tokens between accounts.


Hugging Face Gradio App on Docker

Sun, 2022-01-23 13:49
This quick tutorial is to explain and show how to run Hugging Face model with Gradio UI on Docker.


Running Hugging Face LayoutLM Model with PyCharm and Docker

Sat, 2022-01-15 14:09
This tutorial explains how to run Hugging Face LayoutLM model locally with PyCharm remote interpreter. This is cool, because a remote interpreter allows you to run and debug your custom logic, while running Hugging Face model and its dependencies in Docker container. I share Dockerfile, which helps to setup all dependencies. Enjoy!


Table Query with Hugging Face ML

Sun, 2022-01-09 13:35
Yes, you can do a search through a table data with Hugging Face model called TAPAS. I show how it works with sample CSV and example queries. The app runs on Hugging Face Spaces and you can play and upload your own CSV files for a test. Give it a try, maybe ML can replace SQL?


Hugging Face Gradio Python UI and CSV Processing

Sun, 2022-01-02 14:48
Explaining how to process CSV file uploaded through Gradio UI in Python. Gradio is part of Hugging Face. You will also learn how to define inputs and outputs for Gradio, to render UI components out of the box. Towards the end of the video, I will share a tip on how to read an error message, if the error happens during app development.


Hugging Face Gradio Python UI for ML

Sun, 2021-12-26 14:16
I dive into Gradio UI with Python, which is now part of Hugging Face. This is a very cool and simple to use library, it helps to build UI for ML models quickly. This is useful to share ML models with the community and run quick demos to showcase your ML model capabilities. I explain app code structure and how you can map ML model inputs and outputs with Gradio.


TensorFlow.js Node on Docker and Kubernetes

Sat, 2021-12-18 08:37
I explain how to dockerize TensorFlow.js Node app and also run it on Kubernetes. This work was done as part of our open-source MLOps solution Skipper.