When experimenting with a project, it can be helpful (or simply fun!) to try and see which parts of the code are, for example, the most memory-consuming. If those places in the code pose a problem, it’s worth figuring out how they can be improved. Sometimes, all that’s needed is a couple of tweaks to shave off some time here and there. Other times, it’s a good idea to consider a completely different approach.
If you’re about to start a big data project you will be either retrieving a lot of information or crunching big numbers on your machine, or both. However, if the code is sequential or synchronous your application may start struggling.
Let’s see which concepts and Python libraries can improve your application’s performance in each case.
There are two dimensions along which you can speed your program — I/O and CPU consumption. If your code does, for instance, a lot of file accessing or communication over the network, it’s I/O-bound. CPU-bound code involves heavy computation. …
Python 3.9 has accumulated a lengthy list of improvements with some pretty significant changes such as a new type of parser. The new PEG parser takes a little more memory but is a little faster and should be able to handle certain cases better compared to the LL(1) parser. Although, we’ll probably only start seeing the true effect of that starting from Python 3.10 according to the release documentation. Python 3.9 also expands the usage of some built-in and standard library collection types as generic types without having to resort to the
typing module. …
There are a few types of classical supervised machine learning algorithms: Naive Bayes, MaxEnt (also called Logistic Regression), Decision Trees, Hidden Markov Models (HMMs), Support Vector Machines, Conditional Random Fields (CRFs), Neural Networks, and some others. The top n list may differ depending on the field they are used in.
Most of those types have a few sub-types based on the kinds of tasks they are designed for and implementation approaches they leverage, for example, Bernoulli Naive Bayes, linear-chain CRFs, etc.
Models built using four of those main algorithms — Naive Bayes, MaxEnt, HMMs, and CRFs — can be split…
While you will only occasionally get to the point where you need to run a profiler to analyze your code and find bottlenecks, it’s definitely a good idea to get into the habit of writing efficient code and spotting the places where you can improve right away.
An important thing to keep in mind when looking for ways to optimize your code is that there will most probably always be some trade-offs to accept. For example, it’s either a faster running piece of code or a simpler one. And simplicity here doesn’t mean just “code that looks less cool” (think…
Have you just started self-teaching Python? Great decision! Python is a pretty popular language in a few domains, and particularly in Data Science according to the 2018 Kaggle Machine Learning and Data Science survey. It’s got a number of libraries deemed industry-standard for heavy-duty tasks like NLTK, pandas, spaCy to name a few, as well as quite a few neat tricks up its sleeve for processing data on the go.
So if you’re learning Python for Data Science, take a look to make sure you know about the following options!
All examples are using Python 3.6.
I’m a Data Engineer for Amazon Alexa. AWS Certified Solutions Architect. Python developer. I’m also a big figure skating fan and a foodie.