Slightly more than a month ago I left my job as team-lead of eZ Components at eZ Systems behind, to focus on something new. During the past month I've been contemplating about what to do next and realized that I do not want to take on a new full-time position right away. Instead I will be available to work on (custom) PHP extensions and internals related issues. Extensions are a great way around PHP's limitations and performance issues.
As first project I am working on a "QuickHash" extension for StumbleUpon. This extension circumvents PHP's hefty memory (and performance) overhead by providing more specific data structures. The extension currently implements integer sets and integer to integer hashes. I am now adding integer to string hashes and string to integer hashes. The QuickHash extension will be released under the PHP Licence and I will dedicate another post to it later.
If, like StumbleUpon, you are also interested in having work done on PHP or a specific extensions feel free to contact me. I'd be happy to discuss things with you.

We are happy to announce the availability of eZ Publish 4.3.0 beta2. It provides several fixes and updates to the extensions provided.
We're still growing. This is becoming a theme.
This fixes i18n calls since they have been changed in eZ Publish 4.3. It will only work in 4.3 and newer releases.
I've just released Xdebug 2.1.0beta3 which includes a few crash bugs as well as the issue that headers sent from PHP scripts are not actually set.
You can find the full changelog here and get the latest version from the download page.

Following the alpha1 release of eZ Publish we are happy to announce the beta1 release. It builds on the alpha1 and provides further bugfixes and adjustements.
eZ Find est une extension native d'eZ Publish, maintenant disponible dans les diverses installations du CMS. Mon précédent billet donne une courte définition du fonctionnement d'eZ Find, de son couplage avec Solr, et de sa relation avec les datatypes.
eZ Find est généralement présenté et vendu comme un moteur de recherche, et les utilisateurs (et développeurs) peuvent donc s'attendre à un mécanisme du type :
Cependant, le cadre d'exploitation d'eZ Find est plus vaste que ce schéma fonctionnel. Ce billet décrit un cas d'utilisation certes relativement inutile mais signification d'une utilisation alternative d'eZ Find : construire un nuage de tags.
A partir d'un exemple simple, on peut facilement en déduire d'autres cas d'utilisation qui facilitent énormément le développement de certains projets, comme par exemple les agrégateurs de contenus, les portails et autres mécanismes de navigations complexes dans un catalogue.
La seule méthode un peu optimisée et fonctionnelle de procéder actuellement est l'utilisation d'un opérateur de template qui explore la base de données, et notamment la table ezkeyword. Le package ezwebin propose l'opérateur eztagcloud, qui est facile à déployer et à utiliser.
<div> {eztagcloud( hash( 'class_identifier', 'billet', 'parent_node_id', 2 ) )} </div>
Les fonctions fetch natives ne permettent pas de lister un ensemble de keywords en fonction des paramètres utiles (subtree, classes, etc.), c'est donc la seule façon "économique" et "optimisée" de procéder. Les opérateurs permettent souvent aux développeurs eZ Publish avancés d'optimiser certains traitements, en économisant le nombre de requêtes SQL par exemple, ou en facilitant certains algorithmes laborieux à transposer avec le langage de template (par exemple le calcul des pourcentages des styles CSS inline dans cet opérateur)
L'écriture de ce type d'opérateur est peu accessible aux développeurs occasionnels, et la manipulation du SQL est une pratique dangereuse si le modèle de données eZ Publish est mal maîtrisé (prise en compte des versions, des visibilités, des langues, des droits...). Par ailleurs cet opérateur encapsule la logique algorithmique du calcul des pourcentages transmis au "font-size" en style inline. Les amoureux du CSS full externe, ou de l'accessibilité devront donc adapter cet opérateur à leur besoin.
Derrière ce terme "géométrique" se cache un concept finalement assez simple et naturel, que l'on pourrait appeler : "groupement des résultats pour un champs", à savoir :
On peut transposer cet exemple sur tous les attributs et meta données d'une classe (name, dates, auteur, attribut quelconque), et même obtenir N listes de facettes sur N attributs et méta différents
Cet exemple de code montre comment construire sa requête eZ Find, récupérer les facettes résultantes sur l'attribut "tags" de type "keywords", et gérer le poids des keywords en fonction d'un algorithme simplifié (j'ai un peu triché sur cet aspect, puisque ce n'est pas l'objet de la démonstration).
{def $search_keywords=fetch( ezfind , search, hash( query , '', 'facet', array( hash('field', 'billet/tags', 'sort', 'alpha', 'limit', 100 )), 'class_id', array('billet'), 'filter', array('not', 'billet/tags:""'), 'subtree_array', array(2) ))} {def $search_extras_keywords=$search_keywords['SearchExtras']} {def $search_count_keywords=$search_keywords['SearchCount']} <li id="blog_block_{$bloc_count}" class="colonne_block"> <h1>Tags ezfind :</h1> <div class="tagclouds {$current_css}"> {foreach $search_extras_keywords.facet_fields[0].nameList as $facetID => $name} {def $keyword_count = $search_extras_keywords.facet_fields[0].countList[$facetID]} {def $percent = $keyword_count|div( $search_count_keywords )|mul( 200 )|floor|sum( 100 ) } <a href={concat( $root_blog_node.url_alias, '/(tag)/', $name )|ezurl()} style="font-size: {$percent}%" title="{$keyword_count} billets taggés '{$name}' // ">{$name|wash()}</a>, {undef $percent} {/foreach} </div> </li> {undef $search_extras_keywords $search_keywords $search_count_keywords}
<!-- eZ Find: This field type is dedicated to ez publish keywords. --> <fieldtype name="keyword" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.PatternTokenizerFactory" pattern=", *" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <!--<filter class="solr.LowerCaseFilterFactory"/>--> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.PatternTokenizerFactory" pattern=", *" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype>
eZUpgrade is stand-alone application (not an eZ Publish extension) automating the process of upgrading an eZ Publish installation.
Using eZUpgrade is as easy as checking it out on the server where the upgrade installation will reside, inputting some configuration settings, and running
ezupgrade from the command line.
VLD is a tool that I started working on years ago to visualise the opcode arrays in PHP. Opcode arrays are what PHP's compiler generates from your source code and can be compared to assembler code that is generated by a C compiler. Instead of it being directly executed by the CPU, it is instead executed by PHP's interpreter.
Over the years I've been adding some functionality, also aided by Ilia and some others, to show more information. For example Ilia has added a more verbose dumping format for opcodes (through the vld.verbosity setting) whereas I have added routines to find out which ops in oparrays can never be reached. A very simple example of the latter is shown here:
If we run the above through VLD with php -dvld.active=1 test.php, you'll see the following output (I removed the part about the script body itself):
Function test:
filename: /tmp/test1.php
function name: test
number of ops: 9
compiled vars: none
line # * op fetch ext return operands
---------------------------------------------------------
2 0 > EXT_NOP
4 1 EXT_STMT
2 ECHO 'Hello%21%0A'
5 3 EXT_STMT
4 > RETURN true
7 5* EXT_STMT
6* ECHO 'This+will+not+be+executed.%0A'
8 7* EXT_STMT
8* > RETURN null
End of function test.
Every opcode that has a * after the number (like in 5*) is code that can not be reached, and can possibly be eliminated from the oparrays in an optimiser.
The dead code analysis routines have also made their way into Xdebug which uses them for the code coverage functionality to highlight dead code. This mostly makes sense if you are running your code coverage together with unit tests such as you can do with PHPUnit.
Recently I've been working on some new functionality to visualise all the code paths that make up each function. These new routines sit on top of the routines that do dead code analysis. Every branch instruction (such as if, but also for and foreach) is analysed and a list of branches is created. Each branch contains information about the line on which the branch starts, the starting and ending opcode numbers that belong to the branch, as well as to which other branches this branch can jump to. There can be either no linked branches (when for example a return or throw statement is found), one linked branch (for an unconditional jump) or two linked branches (on a branch instruction). However, you need to be aware that internally, PHP's opcode don't always reflect the source code exactly.
Once all the branches and their links are found, another algorithm runs to figure out which paths can be created out of all the branches. It is best to illustrate this with an example. So let us look at the following script:
In this script we have a for-loop with a nested if construct. When we run this script through VLD (with php -dvld.verbosity=0 -dvld.dump_paths=1
-dvld.active=1 test2.php) we get the following output (again, only the test() function and with some white space modifications):
Function test:
filename: /tmp/test2.php
function name: test
number of ops: 22
compiled vars: !0 = $i
line # * op fetch ext return operands
-----------------------------------------------------------
2 0 > EXT_NOP
4 1 EXT_STMT
2 ASSIGN !0, 0
3 > IS_SMALLER ~1 !0, 10
4 EXT_STMT
5 > JMPZNZ 9 ~1, ->18
6 > POST_INC ~2 !0
7 FREE ~2
8 > JMP ->3
6 9 > EXT_STMT
10 IS_SMALLER ~3 !0, 5
7 11 > JMPZ ~3, ->15
8 12 > EXT_STMT
13 ECHO '-'
9 14 > JMP ->17
12 15 > EXT_STMT
16 ECHO '%2B'
14 17 > > JMP ->6
15 18 > EXT_STMT
19 ECHO '%0A'
16 20 EXT_STMT
21 > RETURN null
branch: # 0; line: 2- 4; sop: 0; eop: 2; out1: 3
branch: # 3; line: 4- 4; sop: 3; eop: 5; out1: 18; out2: 9
branch: # 6; line: 4- 4; sop: 6; eop: 8; out1: 3
branch: # 9; line: 6- 7; sop: 9; eop: 11; out1: 12; out2: 15
branch: # 12; line: 8- 9; sop: 12; eop: 14; out1: 17
branch: # 15; line: 12-14; sop: 15; eop: 16; out1: 17
branch: # 17; line: 14-14; sop: 17; eop: 17; out1: 6
branch: # 18; line: 15-16; sop: 18; eop: 21
path #1: 0, 3, 18,
path #2: 0, 3, 9, 12, 17, 6, 3, 18,
path #3: 0, 3, 9, 15, 17, 6, 3, 18,
End of function test.
This dump consists of a few different parts. First of all we can see some basic information containing the name, the number of ops (22) and the compiled variables. The second part is a dump of all the opcodes that make up this function. The last part contains information about all the branches and the possible paths. This information is a bit hard to visualize in its textual form, so I've also added some code that dumps this information to a file format that the GraphViz tool "dot" can use to create a pretty graph. For this we re-run the previous PHP invocation as php -dvld.dump_paths=1
-dvld.verbosity=0 -dvld.save_paths=1 -dvld.active=1 test2.php. This creates the file /tmp/paths.dot that "dot" can use. If we run dot -Tpng
/tmp/paths.dot > /tmp/paths.png we end up with the following picture:
If we put this graph next to the code, we can explain how this works. Every branch is named by the number of the first opcode in that branch:
op #1 is the assignment of $i in line 4.
op #3 is the loop test in line 4. If the condition doesn't match, we jump to op #18 on line 16 that echos the newline.
op #9 is the if condition on line 6.
op #12 is when the if condition returns true and
op #15 is when the if condition returns false.
op #17 sits behind both op #12 and op #15 and makes sure there is a jump to the counting expression in #op 6.
op #6 is the post increment operation on line 4 which will then again be followed by op #3 to check whether the end of the loop has been reached.
This is of course a very simple example, but it also works for (multiple) classes and functions in a file. You just need to make sure to tell VLD that you don't want the code executed as the output could be very large. You can use the vld.execute=0 php.ini setting for that.
I hope this new functionality can spread some light on how loops etc. work in PHP. In order to play with the code, you need to check-out VLD from my SVN with svn co svn://svn.xdebug.org/svn/php/vld/trunk vld. You can also view the code on-line at http://svn.xdebug.org/cgi-bin/viewvc.cgi/vld/trunk/?root=php. Look out for a new release coming soon!