Docker containers are lighter than virtual machines but in many cases images are way bigger than they ought to be. For example the official docker python image is approximately 900MB in size and this is only for the python runtime with no external libraries installed.
Python itself is not small, a typical python installation needs close to 100 MB once uncompressed on the disk. Of course one could imagine that here are many files included that aren't needed in most of the usual cases (like the
turtle module). Is it possible to create a smaller python docker image?
According to Kaggle R, Matlab and Python are the favorite languages among competition winners. Is it too crazy trying to win a Kaggle competition using bash? It probably is, bash doesn't even support floating point aritmetic! But still, it is possible to beat the benchmark, and in fact this might be a record for the shortest BtB ever:
How much can your name tell about you? I was curious to find out wether it is possible to predict the etnicity of a person, just by looking at its name. I know a lot about hispanic names so lets try to make a model that predicts hispanic origin.
I recently had the chance to participate in a Kaggle competition. Because it was my first one I made a lot of mistakes which ended costing me a lot of time. I had actually started to play with the Titanic challenge before, but my motivation was lost when I saw some competitors with close to 100% precision in the leaderboard. This ruined my competitive spirit as there was no way to measure the achievable accuracy for the competition.
The formulation of the problem was actually quite simple: Predict the tags that a user would choose for a specific Stackexchange question given only the question text and its title. Participants were given a training dataset with approx. 6 million rows with tags for each question and a test dataset with approx. 2 million rows without the tags where the prediction was to be made.
This is the third post from a series dedicated to web scraping
Some websites implement measures against scraping. A common reason is to protect the data, for example site scraping is sometimes used to gather product and pricing data to undercut rivals’ prices for goods and services (Ryanair case).
In this post I will show a technique that can be used to bypass cookie based protections. As an example we are going to download data from [Kaggle] (http://www.kaggle.com). This is useful if you want for example to download a big data file directly to Amazon EC2. Kaggle only allows browser based downloads, because you have to accept the competition rules before getting access to the data.
This is the second post from a series dedicated to web scraping
This is exactly what we see when we use "Inspect element" in Google Chrome or Safari. In contrast if we use the "View Source" tool, we get the original html that was downloaded for the server.
This is the first post from a series dedicated to web scraping
There are at least three ways to analyze the html from a page. In order to scrape fairly regularly formatted data from large documents, a regular expression is the right solution and will be faster than a generic parser. But bear in mind that html is not always well constructed and as the name suggests regular expression are designed to deal with regular structures.
In Coursera's Computational Investing Prof. Tucker Balch, who was always looking for ways to make the course more interesting, proposed the following challenge:
Write the most succinct NumPy code possible to compute a 2D array that contains all "legal" allocations to 4 stocks:
- "Legal" means: The allocations are in 0.10 chunks, and the allocations sum to 1.00
- Only "pure" NumPy is allowed (no external libraries)
- Can you do it without a "for"?"