Basic tools that help find bottlenecks in your application

Photo by Possessed Photography on Unsplash

When experimenting with a project, it can be helpful (or simply fun!) to try and see which parts of the code are, for example, the most memory-consuming. If those places in the code pose a problem, it’s worth figuring out how they can be improved. Sometimes, all that’s needed is a couple of tweaks to shave off some time here and there. Other times, it’s a good idea to consider a completely different approach.

First, it’s worth asking oneself if profiling and optimizing the code is worth the over-complication that it may bring. It’s really important to carefully consider all…


Making Sense of Big Data

A concise overview of approaches available in Python

Photo by ThisisEngineering RAEng on Unsplash

If you’re about to start a big data project you will be either retrieving a lot of information or crunching big numbers on your machine, or both. However, if the code is sequential or synchronous your application may start struggling.

Let’s see which concepts and Python libraries can improve your application’s performance in each case.

What concurrency and parallelism are and what problems they address

There are two dimensions along which you can speed your program — I/O and CPU consumption. If your code does, for instance, a lot of file accessing or communication over the network, it’s I/O-bound. CPU-bound code involves heavy computation. …


New features that can simplify your data processing code — what are they and what do they improve?

Photo by Kanan Khasmammadov on Unsplash

Python 3.9 has accumulated a lengthy list of improvements with some pretty significant changes such as a new type of parser. The new PEG parser takes a little more memory but is a little faster and should be able to handle certain cases better compared to the LL(1) parser. Although, we’ll probably only start seeing the true effect of that starting from Python 3.10 according to the release documentation. Python 3.9 also expands the usage of some built-in and standard library collection types as generic types without having to resort to the typing module. …


Using the examples of Naive Bayes, MaxEnt (Logistic Regression), HMM, and CRF

Photo by Tianyi Ma on Unsplash

There are a few types of classical supervised machine learning algorithms: Naive Bayes, MaxEnt (also called Logistic Regression), Decision Trees, Hidden Markov Models (HMMs), Support Vector Machines, Conditional Random Fields (CRFs), Neural Networks, and some others. The top n list may differ depending on the field they are used in.

Most of those types have a few sub-types based on the kinds of tasks they are designed for and implementation approaches they leverage, for example, Bernoulli Naive Bayes, linear-chain CRFs, etc.

Models built using four of those main algorithms — Naive Bayes, MaxEnt, HMMs, and CRFs — can be split…


Essential tips for working with large amounts of data in Python

Photo by Chris Ried on Unsplash

While you will only occasionally get to the point where you need to run a profiler to analyze your code and find bottlenecks, it’s definitely a good idea to get into the habit of writing efficient code and spotting the places where you can improve right away.

An important thing to keep in mind when looking for ways to optimize your code is that there will most probably always be some trade-offs to accept. For example, it’s either a faster running piece of code or a simpler one. And simplicity here doesn’t mean just “code that looks less cool” (think…


Methods and tricks that I found useful for exploring and processing data on the go, back when I just started learning Python a few years ago.

Photo by Boitumelo Phetla on Unsplash

Have you just started self-teaching Python? Great decision! Python is a pretty popular language in a few domains, and particularly in Data Science according to the 2018 Kaggle Machine Learning and Data Science survey. It’s got a number of libraries deemed industry-standard for heavy-duty tasks like NLTK, pandas, spaCy to name a few, as well as quite a few neat tricks up its sleeve for processing data on the go.

So if you’re learning Python for Data Science, take a look to make sure you know about the following options!

All examples are using Python 3.6.

1. Reading files

To start off, you…

Anna Astori

I’m a Data Engineer for Amazon Alexa. AWS Certified Solutions Architect. Python developer. I’m also a big figure skating fan and a foodie.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store