HTML version at: http://www1.ngtech.co.il/repo/release-3.4.7.html
I am happy to release the new RPMs of squid 3.4.7 for Centos 6.5 32bit and 64bit CPUs and OpenSUSE 13.1 64bit.
The new release includes couple bug fixes and CVE.
The links to the RPMs and a little apology at the bottom of the article.
Time Space Continuum of CS
(Time Space trade-off or CPU vs Ram\Disk\Flash trade-off)
A long ago I used to work in a big warehouse and I have been asked to make the place work better using a “barcode reader”.
There were couple issues regarding the barcode reader and the barcode fonts to be used comparing different speeds and other components such as the fact that there are international barcode fonts and there are also in-warehouse codes which required special fonts.
Couple questions for example: “If a product has an international barcode font stamp, will I need to put a new barcode stamp on it?”
another one: “Would it be better to use a new products coding scheme that includes only numbers?”
For each and every step on the way there was an obstacle.
One of them was that since the more complex barcode stamps allowed letters it took more time for the reader to analyze the font.
Another very big issue was that using the barcode reader and a very big line of customers the employees was much faster finding the code manually then the next steps: “Picking the reader, finding the stamp and reading the font”.
Leaving aside a very heavy objects such as motors or by size objects and about 30k++ items in the store.
My job was to organize and decide the right path for the warehouse.
Indeed the more veteran employees new each and every item and if not by name by type and by their own lookup pattern in the already computerized system.
From the other side were the new employees which didn't new a thing about the warehouse and in most cases could not handle heavy clients traffic without the barcode stamps as identifier and while learning from the veterans to lookup the products instead of using the barcode readers.
The task was simple: “make the barcode stamps work!”
It took more then 2 years to take the place from 0 barcode to “more” barcode.
Indeed the “space” vs “time” has won.. again!
While loosing more time on a slow process of making the barcode stamps “work” eventually even the veterans forgot how fast they can do the products lookup themselves.
The end
While this story happened years ago not long before this one the “y2k bug” was one of the mighty examples of a simple bad choice encountering the “time vs space trade-off”!
In the early days of computers some programmers considered between storing the full year using all digits in file on disk or just store the last 2 digits of the year.
The result was not that bad but when the y2k was about to get there, some people started to write about what could be the consequences of this “bad” choice by these programmers.
Some got into panic and others started working on the issue.
I cannot tell yet which one have won in this match of “time vs space”.
Computer Scientists all over the years are seeking for the right solution in this trade-off.
To get to the answer will always be more complex since computers are a very complex machines which works in many levels and layers that vary between physics electricity and mathematics.
The questions about it are in the “bit” level which a CPU operates in and also on other levels such as alpha numeric.
One of the big examples is in audio and video compression.
Audio and video in computers are stored as numbers, either in a scale between 0 to 44khz or as a pixel matrix.
When you take a full 100% matrix of dots you can see that a 10MP image can take much more then 10MB and if you want to make it a video by using 10-90 Frames Per Second you can forget about a 1 minute video to use 1MB.
I am not an expert in compression algorithms but it seems that one way to reduce the consumed space is to pic a start point with a good picture and from there see what is being changed and save only these changes.
The changes must almost always consume less then the full picture in many cases and there for will match some of them.
Another example from the audio field is to use something like “notes”.
Organize a list of sounds\notes in a good quality and just point the player to the sound with a relational time of playback and duration(midi files have used this concept).
A more system admins related subjects can be understood in the “Big storage” software components.
When a sys-admin talks about storage he wants and\or needs a fast system.
So in a case He needs to store 10MB file each day(which is 3.65GB for a full year) the problem is not that big else then that he needs to store 365 files per year.
These days some admins will laugh when a question regarding this “system” will even be asked loudly.
They indeed will settle down with their laugh when the device engineer will explain them that this device is actually a device that should be sitting in space for the next 20 years and needs to be 100% operational with no down time and 0 operations failure not to speak about out of sys-admin “magic” touch of restart.
But down to earth we have access most of the time to the disks and to the systems while the sys-admin can do almost what ever is needed and it's his job to operate these systems and not just one of them but hundreds of them.
Each and every one of these small systems contains logs which can be 10MB each day and can be more then 1GB.
Now that we are talking about more then 100GB per day and 3.65TB per year(1 Million files per year) we need to think about the lookup of these files!
Now I hope, the issue is much more understood then before.
There are many methods to approach the issues with lots of objects\files and lots of space.
One layer specifically in this whole problem is how to distribute and then find these files since there is also Meta-data that should be stored along the way.
In the OpenSource world many solutions by many Computer Scientists and programmers was developed and two of them was “Caught” by RedHat: Ceph and GlusterFS.
There is a big difference in the approach of these two products to the issues.
GlusterFS main idea is to take lots of hardware and to give the client the option to maximize all this hardware performance for operations using a Software Only approach based on linux.
You can theoretically take 10 machines with 24 disks each and use them all as one big replica volume which can give you the option to read from 240 disks at the same time chunks of 1 file.
How fast would that be?(leaving aside how much power was invested in it)
One of the nicest things about it is that instead of reinventing everything all together they took the File system, the OS and used everything they can from them as they are.
It uses FUSE to mount the volumes and uses a UserSpace daemon to give access for the volume.
One of the main drawbacks is that it's designed for Big-Data which means lots of big files and not lots of small files.
Dont forget that lots of small files gives the creeps to any clustered NAS solution.
There are ways to “convert” the whole Lots of Small Files into Big Files but it will always cost us in something.
Time vs Space?? I wonder what time would bring to GlusterFS
One of the problem with files is their Meta-data which is not “the data” itself but it is important enough to save.
In some cases the metadata is there to help the storage system to work, identify and locate it.
In GlusterFS they took another approach!
They use a hash calculation of the file name to decide if it should be sitting in one node or another.
This way the lookup for one file location can be very fast.
It is a great idea but it has one specific flaw that Joe Julian (which is one of GlusterFS main characters) describes here:
http://joejulian.name/blog/dht-misses-are-expensive/
In summery it will describe the flaws of DHT miss which can cause a very long search which is very bad for small files.
If you want to understand a bit more about DHT and how couple p2p implementations use them take a look at these video lectures of ITS413 , lecture 20 and lecture 21 by Dr Steven Gordon (http://ict.siit.tu.ac.th/~sgordon/)
Lecture 20: http://www.youtube.com/watch?v=qzGiAOQ-SqQ
Lecture 21: http://www.youtube.com/watch?v=qqv4OJ5Lc4E
* GlusterFS limitations are bounded to 2 power 64 in almost any factor from max files size and maximum number of nodes etc bricks etc.
* I have tried to use GlusterFS as a back-end for squid rock and ufs cache_dir in couple ways and it seems like there is a way to utilize GlusterFS as squid back-end but still needs more testing so don't try it on production proxies without more details.
* Any notes and comments are wanted and welcome!
More to come about Storage to come...
= RPMs release notificiation
I first apologies if someone had troubles with my repo in the last month.
The cache server (hhm not squid) for the repo didn't revalidates the cached content and there for a file that was available mistakenly on the origin server for less then a minute was wrongly cached for a very long period of time.
Now I have replaced it with squid and with hope it will not happen again.
The new release uses the new squid RPMs packaging format which consists of two files:
squid-%version%.rpm which contains the squid binary
squid-helpers-%version%.rpm which contains the squid helpers
The RPMS at: 
http://www1.ngtech.co.il/rpm/centos/6/x86_64/ 
http://www1.ngtech.co.il/rpm/centos/6/i686/
The package includes 3 RPMs one for the squid core one for the helpers, the other is for debuging. 
For x86_64:
http://www1.ngtech.co.il/rpm/centos/6/x86_64/squid-3.4.7-1.el6.x86_64.rpm
http://www1.ngtech.co.il/rpm/centos/6/x86_64/squid-debuginfo-3.4.7-1.el6.x86_64.rpm
http://www1.ngtech.co.il/rpm/centos/6/x86_64/squid-helpers-3.4.7-1.el6.x86_64.rpm
For i686:
http://www1.ngtech.co.il/rpm/centos/6/i686/squid-3.4.7-1.el6.i686.rpm
http://www1.ngtech.co.il/rpm/centos/6/i686/squid-debuginfo-3.4.7-1.el6.i686.rpm
http://www1.ngtech.co.il/rpm/centos/6/i686/squid-helpers-3.4.7-1.el6.i686.rpm
An update for 3.3 x86_64:
http://www1.ngtech.co.il/rpm/centos/6/x86_64/squid-3.3.13-1.el6.x86_64.rpm
http://www1.ngtech.co.il/rpm/centos/6/x86_64/squid-debuginfo-3.3.13-1.el6.x86_64.rpm
http://www1.ngtech.co.il/rpm/centos/6/x86_64/squid-sysvinit-3.3.13-1.el6.x86_64.rpm
An update for 3.3 i686:
http://www1.ngtech.co.il/rpm/centos/6/i686/squid-3.3.13-1.el6.i686.rpm
http://www1.ngtech.co.il/rpm/centos/6/i686/squid-debuginfo-3.3.13-1.el6.i686.rpm
This time due to a special request I have built RPM for OpenSUSE 13.1 at:
http://www1.ngtech.co.il/rpm/opensuse/13.1/x86_64/squid-3.4.7-1.x86_64.rpm
SRPM at:
http://www1.ngtech.co.il/rpm/opensuse/13.1/SRPMS/squid-3.4.7-1.src.rpm
To Each and everyone of them there is an *asc* file which *dosn't* contains PGP and 
MD5 SHA1 SHA2 SHA256 SHA384 SHA512 hashes. 
All The Bests,
Eliezer Croitoru