community news (ez.no)  eZ systems employee

› eZ Publish 4.3.0 beta1 released

Following the alpha1 release of eZ Publish we are happy to announce the beta1 release. It builds on the alpha1 and provides further bugfixes and adjustements.

23/02/2010 10:06 am (UTC)   Community news (ez.no)   View entry   Digg!  digg it!   del.icio.us  del.icio.us

gilles guirand

› eZ Find et ses utilisations alternatives : Faire un nuage de tags

eZ Find est une extension native d'eZ Publish, maintenant disponible dans les diverses installations du CMS. Mon précédent billet donne une courte définition du fonctionnement d'eZ Find, de son couplage avec Solr, et de sa relation avec les datatypes.

eZ Find est généralement présenté et vendu comme un moteur de recherche, et les utilisateurs (et développeurs) peuvent donc s'attendre à un mécanisme du type :

  • Je saisie une expression libre
  • J'envoie ma recherche
  • J'obtiens une liste de résultat, et j'applique quelques tris (alphabétique, dates, pertinence) et quelques filtres disponibles (par rubriques, par facettes, etc.)

Cependant, le cadre d'exploitation d'eZ Find est plus vaste que ce schéma fonctionnel. Ce billet décrit un cas d'utilisation certes relativement inutile mais signification d'une utilisation alternative d'eZ Find : construire un nuage de tags.

A partir d'un exemple simple, on peut facilement en déduire d'autres cas d'utilisation qui facilitent énormément le développement de certains projets, comme par exemple les agrégateurs de contenus, les portails et autres mécanismes de navigations complexes dans un catalogue.

Comment construire un nuage de tags sur eZ Publish ?

La seule méthode un peu optimisée et fonctionnelle de procéder actuellement est l'utilisation d'un opérateur de template qui explore la base de données, et notamment la table ezkeyword. Le package ezwebin propose l'opérateur eztagcloud, qui est facile à déployer et à utiliser.

  • Voici un exemple d'appel de l'opérateur dans un template
<div>
   {eztagcloud( hash( 'class_identifier', 'billet', 'parent_node_id', 2 ) )}
</div>
 

Avantages de l'opérateur

Les fonctions fetch natives ne permettent pas de lister un ensemble de keywords en fonction des paramètres utiles (subtree, classes, etc.), c'est donc la seule façon "économique" et "optimisée" de procéder. Les opérateurs permettent souvent aux développeurs eZ Publish avancés d'optimiser certains traitements, en économisant le nombre de requêtes SQL par exemple, ou en facilitant certains algorithmes laborieux à transposer avec le langage de template (par exemple le calcul des pourcentages des styles CSS inline dans cet opérateur)

Inconvénients de l'opérateur

L'écriture de ce type d'opérateur est peu accessible aux développeurs occasionnels, et la manipulation du SQL est une pratique dangereuse si le modèle de données eZ Publish est mal maîtrisé (prise en compte des versions, des visibilités, des langues, des droits...). Par ailleurs cet opérateur encapsule la logique algorithmique du calcul des pourcentages transmis au "font-size" en style inline. Les amoureux du CSS full externe, ou de l'accessibilité devront donc adapter cet opérateur à leur besoin.

Comment construire un nuage de tags avec eZ Find ?

Comprendre le concept de facettes

Derrière ce terme "géométrique" se cache un concept finalement assez simple et naturel, que l'on pourrait appeler : "groupement des résultats pour un champs", à savoir :

  • Supposons qu'un résultat de recherche contienne 100 billets
  • Sur ces 100 billets, on peut lister 20 mots clés distincts associés
  • Parmi ces 20 mots clés, le mot clés A est associé à 10 billets, le mot clés B est associé à 5 billets, et ainsi dessuite

On peut transposer cet exemple sur tous les attributs et meta données d'une classe (name, dates, auteur, attribut quelconque), et même obtenir N listes de facettes sur N attributs et méta différents

Construite le nuage de mots clés avec des facettes

Cet exemple de code montre comment construire sa requête eZ Find, récupérer les facettes résultantes sur l'attribut "tags" de type "keywords", et gérer le poids des keywords en fonction d'un algorithme simplifié (j'ai un peu triché sur cet aspect, puisque ce n'est pas l'objet de la démonstration).

{def $search_keywords=fetch( ezfind , search,
      hash( query , '',
        'facet', array( 
            hash('field', 'billet/tags', 'sort', 'alpha', 'limit', 100 )),
        'class_id', array('billet'),
        'filter', array('not', 'billet/tags:""'),
        'subtree_array', array(2)
        ))}
    
{def $search_extras_keywords=$search_keywords['SearchExtras']}
{def $search_count_keywords=$search_keywords['SearchCount']}
 
<li id="blog_block_{$bloc_count}" class="colonne_block">
   <h1>Tags ezfind :</h1>
    
    <div class="tagclouds {$current_css}">
    {foreach $search_extras_keywords.facet_fields[0].nameList as $facetID =&gt; $name}
                
        {def $keyword_count = $search_extras_keywords.facet_fields[0].countList[$facetID]}
        {def $percent = $keyword_count|div( $search_count_keywords )|mul( 200 )|floor|sum( 100 ) }
            <a href={concat( $root_blog_node.url_alias, '/(tag)/', $name )|ezurl()} style="font-size: {$percent}%" title="{$keyword_count} billets taggés '{$name}' // ">{$name|wash()}</a>, 
        {undef $percent}
 
    {/foreach}
 
    </div>
</li>
{undef $search_extras_keywords $search_keywords $search_count_keywords}
 

Quelques astuces & clés de compréhensions

  • La recherche sur une chaine vide (query , '') est une technique permettant d'explorer l'ensemble des contenus indexés, en appliquant uniquement les filtres et limitations utiles (class_id, subtree_array par exemple)
  • L'opérateur 'NOT' n'est pas natif et nécessite un petit 'hack' proposé par Bruce Morrison, qui sera sans doute disponible dans les futures versions d'eZ Find
  • Par défaut les facettes sur des datatypes 'keywords' sont présentés en minuscule. Ce n'est pas un bug, mais une fonctionnalité visant à homogénéiser la casse sur les syntaxes similaires (eZ Publish, ez Publish, Ez Publish, etc.). Cependant lorsqu'on est confiant dans la qualité de sa saisie (puisqu'on utilise une extension d'autocomplétion par exemple), il peut être souhaitable de ne pas forcer l'utilisation du minuscule. Pour cela il faut désactiver le filtre solr.LowerCaseFilterFactory dans le fichier /extension/ezfind/java/solr/conf/schema.xml
<!-- eZ Find: This field type is dedicated to ez publish keywords.  --> 
<fieldtype name="keyword" class="solr.TextField" positionIncrementGap="100"> 
     <analyzer type="index"> 
       <tokenizer class="solr.PatternTokenizerFactory" pattern=", *" /> 
       <filter class="solr.TrimFilterFactory" />               
       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> 
       <!--<filter class="solr.LowerCaseFilterFactory"/>--> 
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
     </analyzer> 
     <analyzer type="query"> 
       <tokenizer class="solr.PatternTokenizerFactory" pattern=", *" /> 
       <filter class="solr.TrimFilterFactory" /> 
       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> 
       <filter class="solr.LowerCaseFilterFactory"/> 
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
     </analyzer> 
</fieldtype>
 
23/02/2010 12:11 am (UTC)   Gilles Guirand   View entry   Digg!  digg it!   del.icio.us  del.icio.us

ez projects

› First release of eZUpgrade

eZUpgrade is stand-alone application (not an eZ Publish extension) automating the process of upgrading an eZ Publish installation.

Using eZUpgrade is as easy as checking it out on the server where the upgrade installation will reside, inputting some configuration settings, and running
ezupgrade from the command line.

Check it out.

22/02/2010 3:40 pm (UTC)   eZ Projects   View entry   Digg!  digg it!   del.icio.us  del.icio.us

derick rethans

› More source analysis with VLD

More source analysis with VLD

VLD is a tool that I started working on years ago to visualise the opcode arrays in PHP. Opcode arrays are what PHP's compiler generates from your source code and can be compared to assembler code that is generated by a C compiler. Instead of it being directly executed by the CPU, it is instead executed by PHP's interpreter.

Over the years I've been adding some functionality, also aided by Ilia and some others, to show more information. For example Ilia has added a more verbose dumping format for opcodes (through the vld.verbosity setting) whereas I have added routines to find out which ops in oparrays can never be reached. A very simple example of the latter is shown here:



If we run the above through VLD with php -dvld.active=1 test.php, you'll see the following output (I removed the part about the script body itself):

Function test:
filename:       /tmp/test1.php
function name:  test
number of ops:  9
compiled vars:  none
line     # *  op           fetch  ext  return  operands
---------------------------------------------------------
   2     0  >   EXT_NOP
   4     1      EXT_STMT
         2      ECHO                           'Hello%21%0A'
   5     3      EXT_STMT
         4    > RETURN                         true
   7     5*     EXT_STMT
         6*     ECHO                           'This+will+not+be+executed.%0A'
   8     7*     EXT_STMT
         8*   > RETURN                         null

End of function test.

Every opcode that has a * after the number (like in 5*) is code that can not be reached, and can possibly be eliminated from the oparrays in an optimiser.

The dead code analysis routines have also made their way into Xdebug which uses them for the code coverage functionality to highlight dead code. This mostly makes sense if you are running your code coverage together with unit tests such as you can do with PHPUnit.

Recently I've been working on some new functionality to visualise all the code paths that make up each function. These new routines sit on top of the routines that do dead code analysis. Every branch instruction (such as if, but also for and foreach) is analysed and a list of branches is created. Each branch contains information about the line on which the branch starts, the starting and ending opcode numbers that belong to the branch, as well as to which other branches this branch can jump to. There can be either no linked branches (when for example a return or throw statement is found), one linked branch (for an unconditional jump) or two linked branches (on a branch instruction). However, you need to be aware that internally, PHP's opcode don't always reflect the source code exactly.

Once all the branches and their links are found, another algorithm runs to figure out which paths can be created out of all the branches. It is best to illustrate this with an example. So let us look at the following script:



In this script we have a for-loop with a nested if construct. When we run this script through VLD (with php -dvld.verbosity=0 -dvld.dump_paths=1 -dvld.active=1 test2.php) we get the following output (again, only the test() function and with some white space modifications):

Function test:
filename:       /tmp/test2.php
function name:  test
number of ops:  22
compiled vars:  !0 = $i
line     # *  op             fetch  ext  return  operands
-----------------------------------------------------------
   2     0  >   EXT_NOP
   4     1      EXT_STMT
         2      ASSIGN                             !0, 0
         3  >   IS_SMALLER                 ~1      !0, 10
         4      EXT_STMT
         5    > JMPZNZ                  9          ~1, ->18
         6  >   POST_INC                   ~2      !0
         7      FREE                               ~2
         8    > JMP                                ->3
   6     9  >   EXT_STMT
        10      IS_SMALLER                 ~3      !0, 5
   7    11    > JMPZ                               ~3, ->15
   8    12  >   EXT_STMT
        13      ECHO                               '-'
   9    14    > JMP                                ->17
  12    15  >   EXT_STMT
        16      ECHO                               '%2B'
  14    17  > > JMP                                ->6
  15    18  >   EXT_STMT
        19      ECHO                               '%0A'
  16    20      EXT_STMT
        21    > RETURN                             null

branch: #  0; line:  2- 4; sop:  0; eop:  2; out1:   3
branch: #  3; line:  4- 4; sop:  3; eop:  5; out1:  18; out2:   9
branch: #  6; line:  4- 4; sop:  6; eop:  8; out1:   3
branch: #  9; line:  6- 7; sop:  9; eop: 11; out1:  12; out2:  15
branch: # 12; line:  8- 9; sop: 12; eop: 14; out1:  17
branch: # 15; line: 12-14; sop: 15; eop: 16; out1:  17
branch: # 17; line: 14-14; sop: 17; eop: 17; out1:   6
branch: # 18; line: 15-16; sop: 18; eop: 21
path #1: 0, 3, 18,
path #2: 0, 3, 9, 12, 17, 6, 3, 18,
path #3: 0, 3, 9, 15, 17, 6, 3, 18,
End of function test.

This dump consists of a few different parts. First of all we can see some basic information containing the name, the number of ops (22) and the compiled variables. The second part is a dump of all the opcodes that make up this function. The last part contains information about all the branches and the possible paths. This information is a bit hard to visualize in its textual form, so I've also added some code that dumps this information to a file format that the GraphViz tool "dot" can use to create a pretty graph. For this we re-run the previous PHP invocation as php -dvld.dump_paths=1 -dvld.verbosity=0 -dvld.save_paths=1 -dvld.active=1 test2.php. This creates the file /tmp/paths.dot that "dot" can use. If we run dot -Tpng /tmp/paths.dot > /tmp/paths.png we end up with the following picture:

vld-paths.png

If we put this graph next to the code, we can explain how this works. Every branch is named by the number of the first opcode in that branch:

  • op #1 is the assignment of $i in line 4.

  • op #3 is the loop test in line 4. If the condition doesn't match, we jump to op #18 on line 16 that echos the newline.

  • op #9 is the if condition on line 6.

  • op #12 is when the if condition returns true and

  • op #15 is when the if condition returns false.

  • op #17 sits behind both op #12 and op #15 and makes sure there is a jump to the counting expression in #op 6.

  • op #6 is the post increment operation on line 4 which will then again be followed by op #3 to check whether the end of the loop has been reached.

This is of course a very simple example, but it also works for (multiple) classes and functions in a file. You just need to make sure to tell VLD that you don't want the code executed as the output could be very large. You can use the vld.execute=0 php.ini setting for that.

I hope this new functionality can spread some light on how loops etc. work in PHP. In order to play with the code, you need to check-out VLD from my SVN with svn co svn://svn.xdebug.org/svn/php/vld/trunk vld. You can also view the code on-line at http://svn.xdebug.org/cgi-bin/viewvc.cgi/vld/trunk/?root=php. Look out for a new release coming soon!

19/02/2010 12:27 pm (UTC)   Derick Rethans   View entry   Digg!  digg it!   del.icio.us  del.icio.us

thomas koch

› Zookeeper for web developers

Have you ever developed any kind of distributed system? When doing so for the first time, you're very likely to fall in the trap of the Fallacies of Distributed Computing. I've done so, you'll do so too.
Now zookeeper is an application, that helps you implement many distributed protocols on top of it. The hard work of implementing fault tolerance, assuring consistency and that kind of stuff is done by zookeeper in the background. A zookeeper cluster consists of at least three servers running zookeeper (zk). A client can connect to each zk server and issue read and write requests. Zookeeper guaranties that a write either fails or is consistent and that a read will get you the most recent state.
Zookeeper exposes a filesystem like hierarchy of so called znodes. Every znode can have children but also stores data. The data stored in a znode is assumed to be small (less then 1MB). Clients can subscribe to different events on a znode and will be notified of changes in the znode itself of it's children.
Some usage examples come to my mind, which could be especially interesting for PHP developers:
  • save the PHP session in zk and have it therefor available to all web servers
  • save a shopping card
  • save the online status of a user (for a chat system)
  • synchronize configuration files
  • master election: many monitoring servers, but only one is active at any given time
  • logging system: bookkeeper is a zk contribution, that receives your logs and assures they'll be kept save
  • create a lock on a document in a CMS

Companies that use zk include Facebook, Yahoo, Rackspace and as rumors tell also twitter.
Now this is all kind and sweet, the only sad thing is, there's no PHP binding yet. There are bindings for Java, C, Python and Perl. So if you're desperately searching what PHP extension you should write next, just take the zk C binding and expose it to PHP!
Maybe you're a student and would like to participate in the Google Summer of Code? It should be possible to find a mentor for this project either in the PHP or zk project.
For GSOC the project could include a PHP session implementation in the extension code and the possibility of persistent zk sessions across PHP calls.
If you're in the region of eastern Switzerland: There'll be a presentation about zookeeper at the Webtreff Kreuzlingen on one of the next mondays.
19/02/2010 10:42 am (UTC)   Thomas Koch   View entry   Digg!  digg it!   del.icio.us  del.icio.us

derick rethans

› New Xdebug browser extensions

New Xdebug browser extensions

Years ago I wrote about a Firefox extension that allows you to start an Xdebug debugging session by clicking on an icon in Firefox' status bar. For some unexplained reason, this extension is no longer available through Firefox' addon-site. Although I have a copy at http://xdebug.org/files/xdebug_helper-0.3.1-fx.xpi for archival purposes, there are now a few other browser extensions that do the same thing.

easy Xdebug

easy Xdebug is an extension that serves as a replacement for the now unavailable Xdebug helper extension. It's written by Brecht Vanhaesebrouck of eLime. The extension was originally tested with Netbeans but it also seems to work fine with Komodo.

Xdebug enabler

Xdebug enabler is an extension for Google's Chrome browser. It "allows you to enable and disable triggering Xdebug from with in Chrome. Useful if you are a web developer using an IDE that supports Xdebug like Eclipse with PDT." It's written by 'remailednet' and available through the Google Chrome Extensions website.

JavaScript 'enabler'

I also ran across a blog post by 'Caleb G' from HigherVisibility. Instead of making an extension for a specific browser, he outlines two JavaScript bookmarklets that allow you to start and stop an Xdebug debugging session.

17/02/2010 1:33 pm (UTC)   Derick Rethans   View entry   Digg!  digg it!   del.icio.us  del.icio.us

derick rethans

› Joind.in's API

Joind.in's API

I speak at many conferences and more and more of those conferences are using a service called joind.in. The joind.in website allows attendees of conferences to leave feedback for the speakers, organisers and sponsors. For me as a speaker this feedback by attendees is very important (as long as the comments are constructive). I use those comments to tweak and improve my presentations if I give them at a later moment.

The joind.in website also provides an API that allows you to talk to the service from other applications and sites. I've now integrated this in my site (at the talks page). It uses JQuery's ajax functionality to talk to the backend which queries (and caches) the joind.in API requests. In order to make API calls, you need to make POST requests to a specific URL. The URL depends on what type of object you want to use. For example, there is http://joind.in/api/talk for requesting information about talks, and http://joind.in/api/user for fetching information about users.

Requests can be either made in XML, or with JSON. A simple example to request all comments for a specific talk can be done with something like:




The do_post_request() code I lifted from Wez' page, and looks like:

 array(
                        'method' => 'POST',
                        'content' => $data
                )
        );
        if ( $optional_headers !== null) {
                $params['http']['header'] = $optional_headers;
        }
        $ctx = stream_context_create( $params );
        $fp = fopen( $url, 'rb', false, $ctx );
        $response = stream_get_contents( $fp );
        return $response;
}
?>

I am also fetching the full name for each user. Because this could mean that I have to do a lot of requests I am caching them with eZ Components' Cache component.

You can see the code operational on the talks page, by clicking on the little joind.in logo after each talk that is on the site. If JavaScript is disabled, the logo turns into a link that takes you to the joind.in site with all the comments.

12/02/2010 11:27 pm (UTC)   Derick Rethans   View entry   Digg!  digg it!   del.icio.us  del.icio.us

community news (ez.no)  eZ systems employee

› eZ Publish 4.3.0alpha1 released

We are happy to announce the release of eZ Publish 4.3.0alpha1. This release contains several improvements and bugfixes for the kernel, and is accompanied by several new extensions. A few highlights are provided below.

10/02/2010 2:23 pm (UTC)   Community news (ez.no)   View entry   Digg!  digg it!   del.icio.us  del.icio.us

community news (ez.no)  eZ systems employee

› eZ Publish 4.3.0 alpha1 released

We are happy to announce the release of eZ Publish 4.3.0alpha1. This release contains several improvements and bugfixes for the kernel, and is accompanied by several new extensions. A few highlights are provided below.

10/02/2010 2:23 pm (UTC)   Community news (ez.no)   View entry   Digg!  digg it!   del.icio.us  del.icio.us

eZ publish™ copyright © 1999-2005 eZ systems as