Monday, July 28, 2008

The context of programming

Recent events in my place of work have lead me to ponder the concept of programming context once again. I suspect it is a pervasive concept as I seem to come across it on a regular basis in quite different circumstances. Let me explain.

If I am asked to write a program that accepts two numbers and returns a third number, being the product of the two, then there is not a lot more I need to know. Perhaps knowing the possible range of input numbers would be useful, but really this is a pure mathematical problem and has no context.

If I am asked to write a program that accepts two numbers and returns a third number - the number of residential addresses in a database that fall between those two numbers - then there is quite a bit more I need to know. I need to know whether just street numbers alone should be checked, or whether street names should be included (5th Avenue, for example). Even within street numbers alone, what about flat numbers? It's a bit more complex than the first example as there is a context. I.e. what are we actually trying to achieve here?

Now in a third example, I am asked to write a program that accepts two numbers (x and y) and returns a third number which is the number of active users who have been logged in between x hours and y hours. Again, now the context is complex. How do I define a "logged in user"? Do I define one interactive session as one user, or do I need to reduce this to unique users because some may be logged in more than once. What about "special" users such as system supplied IDs? Should they all be counted, none, or only some?

But the third example is even more complex than I have shown so far. Consider that this function needs to work in a function test environment, in an integrated test environment, and in production. There are some processes that occur only in production, some only in test and some on both. Will this affect the outcome? Is testing on the test system going to be good enough to know it works in production?

Hang on a minute - aren't we talking about system programming? Well, maybe yes and maybe no. If this program is needed to manage software licensing, then it's a system program. But, if it is needed to manage the number of customer service representatives assigned to different parts of the call centre, then no it is not system programming. If it is being used to achieve load balancing for application service jobs then it could go one way or the other.

Now that was a somewhat contrived example, but it helps me to illustrate my point. In all three cases, take two numbers and return a third. The first example I would expect absolutely any programmer to be able to achieve. The second example I would expect any programmer to be able to achieve if complete requirements are provided. If the problem is only defined as I described, then you would need an analyst programmer. For the third example, who would you give the job to, generically speaking?

This is where I see a massive gap. I, myself, have been fortunate to have been involved in both application and systems programming fairly extensively and even if I say so myself I think I'm pretty good at covering off the sorts of issues described above. It also means I am frequently seeing other programmers who are failing to account for the "system" level factors.

In a specific recent case, a developer insisted that my team (who are a development & test support team) replace one version of a program with another so that it 'behaved like production'. That should have been the first red flag. (I was not involved at this stage so I don't know whether I would have caught this at the start.) Why was the test system behaving differently to production?

Well, the developer got his wish and proceeded to begin to make his related code work. Meanwhile, large numbers of other people were tripping over the problems introduced. After several days of analysing the problems we concluded we had to put things back the way they were. To quote Spock - "The needs of the many far outweigh the needs of the few." This programmer was looking in far too narrow a context in defining what needed to be done. He had no concept of the roles this particular program was playing, nor the large number of dependencies it had. For instance, an automated regression testing suite completely failed because of the change.

But perhaps the most spectacular case of lack of context that I have ever encountered was in a previous role.

The product in question was enterprise software being used all around the world and it was incredibly complex. Customers had requested the ability to use off-the-shelf reporting tools (such as Crystal Reports) to create their own reports. The development organisation realised this meant less work on such things for us and considered this was a good idea - but dangerous. Great they can write their own reports, but how to let them into a massive, complex database without (a) massive confusion and (b) the opportunity to corrupt it.

So a plan was hatched to deliver a new library (for self containment) of logical files (views) which would collate the data into meaningful constructs and, importantly, be read-only. My team (again in development & test support) figured out how to deal with this new library for the purposes of the testing done on it. For the most part we just manually created and destroyed these libraries as required and used some of our own toolset which, importantly, is not delivered to customers.

At some point I got to thinking...How are we going to deliver this? The initial response I got from the designer was "on a tape/CD with the rest of it." To cut a long story short, I soon proved that it is impossible to ship a library full of logical files. Period. Can't be done. I took this information back to the designer, along with a rough sketch design of a simple tool which could alleviate the problem, and also be useful within the development shop.

The response? "We didn't budget for that." * Sigh *.

In the end, I wrote a quick (hack) version of that tool on the day we packaged the software. Some months later someone contacted me saying that there was a bug in my code. I sent them to the designer to have it sorted out.

Thanks for reading.
Allister.

Monday, July 21, 2008

2E - Development Standards (Defensive Programming)

Part two in the series and takes a look at defensive programming techniques and how these help you to create reliable programs.

The following a guidelines for creating robust code.

Always check for a divide by zero (runtime) error by checking the divisor field for zero value prior to performing the *DIV operation.

Never move numeric fields into a field with a smaller domain.
With RPG this can cause truncation of the value and with RPG ILE Pre 8.0 will cause an RPG ILE runtime error.

Ensure that your iteration values and counters are large enough to cater for your anticipated maximum.

Ensure that your field sizes for database attributes are sized sufficiently to cater for the number of records anticipated.

Ensure that your arrays are sized to cater for the maximum number of array records anticipated.
Thus avoiding array index out of bounds issues. Remember to balance this with not overly sizing the array and thus causing a performance degragetion.

Always ensure that any substring operations utilising position and length parameters are within the range of the target field. Thus avoiding substring out of bounds.

Remember to check the function options for your function to ensure appropriate behaviour, especially close down program and reclaim resources.

Never use the WRK context in new programs. Use LCL and HLL.
If you choose to fix up old WRK fields, remember to check all other internals within your object and ensure that fields aren't used. This used to be used a trick in the old days to bypass the paramters passing limits. This was pre-arrays and when structure files where a pain.

Avoid HLL User Source and Programs. If you do write user source for RPG first convert program to RPG ILE and write one user source. The sign of a well managed and maintained 2E model is the percentage of HLL code versus generated code. If your models are more than 5% HLL then you have issues and a history of developers who have misunderstood the purpose and philosphies of model based development. IMHO.

Always pass parameters to user source. Do not rely on the generated field names.

Avoid use of CON context as these values are not available for impact analysis and localisation.

Avoid manual source modification. Use a program and the pre-processor directives to amend code automatically.


Source Modification – Special Notes.

Manual source modification must be avoided at all costs. If source is required to be overridden then a source modifying program should be written to automatically perform this function after generation and before compilation using the pre-processor.

In Summary:-

- Do NOT consider source modification unless absolutely necessary.

- Avoid the use of fields that use incremental counters for naming i.e. LCL Context YLnnnn.

- Avoid adding parameters above a field declared as a modified field. Therefore, always try to ensure that your fields that are parameters that are modified are at the top of the parameter declaration list.

- Try to avoid usage of fields higher than 32k. If 64k is required considered looping in 32k blocks as the 64k limit would one day be exceeded.

- Consider a naming standard to help you to easily identify a modified program and its modifying program.

- Consider centralised methods to ensure source modification programs have been successful rather then depend on a developer having to manually check the modified source.

Thanks for reading.
Lee.

Monday, July 14, 2008

Knowledge capture & use in technical support communities - Part 3

In Part 2 I looked at how to capture expert knowledge. In this final instalment, I will offer suggestions on where to store that knowledge, some differing uses of these concepts and offer some final insights into why I wrote the original version of this article.

Once again, I lead off with a small overlap to set the scene.

Electronic storage for fast access

Following the structuring process above introduces one significant disadvantage in a paper-based documentation repository. Frequent referencing to other documents causes the reader to flip pages or have multiple documents arranged on the desk in order to complete a single process.

Simply storing these documents electronically, such as Microsoft Word files in a LAN folder, is not enough to deal with this problem, as it merely shifts the emphasis to clicking the mouse constantly and still does not help out with the comprehension of the process as a whole.

The answer is hyper-linking. Every single reference to another document should be turned into a hyper-link to that document. This guarantees simple, fast, unfettered access to the linked information and, in most cases, an easy return path to the original document.

It is important to choose the document delivery technology with this method of usage in mind. Although MS Word provides inter-document hyper-linking capabilities and is an excellent document editor, it leaves a lot to be desired as a document reader. Better choices are Lotus Notes, or HTML. Lotus Notes serves as both editor and fast viewer. HTML requires a separate editor (and there are many to choose from), but has a ubiquitous interface if you are considering a large audience for the documentation.

Although I would recommend Lotus Notes as an excellent delivery mechanism, it is important to avoid the use of Notes 'Views'. Rather a default, or 'home', document should be launched easily from a bookmark and all navigation from that point should be via hyper-linking. This gives immediacy to the navigation by making it all point-and-click. This typically avoids the need to work with Notes' view characteristics like 'twisties' and scrolling. (Note that both of these can be used within a Notes document if desired.)

Tech-centric versus customer-centric

At the beginning of this series, I referred to the workings of a technical support team. For such a team, there are two key audiences for a documentation repository following the guidelines described in this document.

Perhaps the most important audience, in terms of covering exposure, is the team itself. Sharing the expert knowledge around the team ensures that individual team members do not become indispensable.

However, often the single biggest gain to be made from good documentation is by providing the team's customers with 'self help' information. The audience level for such documentation will necessarily be much lower than for internal documentation and will therefore take more time to prepare, but the ability to point customers to a self-help document for a common problem can save the team enormous amounts of time. This time could be used to improve the internal documentation and then you can reap all of the benefits.

The author's experience

In writing these articles, I am speaking from experience. I have built a customer-centric and a tech-centric database along these lines and they have been very successful. Both were built as Lotus Notes databases.

The first one built was the customer-centric repository, which was deployed to up to 150 developers and testers. This originally came out of a FAQ document introduced during a pilot implementation of a source configuration management product. The documents were listed in a Notes view in broad, simple categories. Each document title was phrased as a question, the answer to which lay inside. Many of the documents referred to others for pre-requisite information.

My team used this database to help keep the load down in our helpdesk-like operation. I estimate that once the database had matured, over 90% of customers who were referred to one of the documents did not need to make further contact with us for that issue. It is clear from the numbers and the nature of calls we received that many customers went straight to this repository and never needed to call us.

The second database was tech-centric and used all of the principles described above. Although the closure of the business meant this repository never quite made it to 'complete' status, the documents that were created (over 60) proved the points I have made above. New members in the team who were technically competent (that's why we hired them) but who had scant knowledge of our processes were able to perform fairly complex tasks purely from the information contained in these documents. Although it took them sometimes magnitudes longer to perform the task than one of the experts, the fact remains that they successfully completed the task with low or no input from the experts in person.

Reflection

In the matter of conversational language being more effective than rote instructions, I can recall being given two articles to read in preparation for a 'knowledge workshop' (that unfortunately never happened).

After many years now, I can only remember what one of those documents said. The one I remember spoke about how conversational language was much more effective than brief and often brusque language that is found in much technical documentation. It made the link to the over-the-shoulder technique and postulated some of the thinking that I have expounded above.

Some time well after I had created the customer-centric database I came across these two documents and re-read them. I immediately tied the conversational language ideas to what had happened in our database and saw that it was true. After re-reading the documents, I also realised why I had remembered this document and not the other. This one was very conversational, in contrast to the much more formal (and fact-and-figure quoting) style in the other document.

To this day, I do not remember what the other document said. Perhaps sometime I will come across the pair again and the cycle will repeat.

Thanks for reading.
Allister.

Sunday, July 6, 2008

2E - Development Standards (Performance)

This is the first part in a complete series of articles I intend to post regarding development best practices and standards for the CA 2e (Synon) development tool. The aim of publishing the guides is to educate, collaborate and enhance the standards by receiving community feedback. After all, no one person can know everything but the wider community can contribute.

Many of these tips I have learnt over the years and quite a lot have been sent to me by interested parties around the world. A big thank you to you all.

I will publish the complete documents on the 2E wiki (soon) with full acknowledgements. (See my links section below).

In the meantime I will publish some selected extracts on this blog just to get your thought processes flowing.

Performance

There are many considerations when programming for performance in CA 2e. A few are highlighted here. This is by no means an exhaustive list. My next technical post will relate to Defensive Programming techniques......

I'd be interested to hear of others from the community in general and would be happy to include them on this blog and the final wiki document.

Drop unused relations where possible and set others to appropriate level. i.e. OPTIONAL or USER etc. This cuts down unneccessary code and processing as well as making your action diagrams more easily navigable.

Avoid FLD for passing parameters for non command line type programs. Will use less PAGs.

Tactically use *QUIT to reduce I/O. Especially when programs have lots of nested validation logic. Very useful when validating. Use the *QUIT inside subroutines to halt further processing. Provides cleaner message feedback to end user and reduces response times.

Avoid Dynamic Selection Access Paths.

Avoid Virtuals, especially Virtuals with relations to files with Virtuals.
Virtuals have their place. Query access paths or in scenarios where they are always used. Best practice in this area is to avoid virtuals and to get data as appropriate.

Ensure programs do not close down if called iteratively.e.g. in a loop or inside USER: Process Record etc for a RTVOBJ or PRTFIL etc. Typically used for externalised RTVOBJs.

Consider sharing ODPs.(Open Data Paths)

Consider usage of shared subroutines.
Minimise the amount of code and reduces object size and makes debugging easier.

Consider usage of Null Update Suppression within your CHGOBJs. Very useful for batch programs.

Avoid unnecessary selector/position fields on subfile selectors.

Avoid contains (CT) selection on control panels

Ensure arrays are appropriately sized.
Too large they will consume more memory.

Reduce file I/O by loading small reference files that are regularly read into arrays upon opening the program. Good examples here wsould be files like TRANSACTION TYPE or XYZ RULES.

Reduce I/O by only getting reference data only on key change. This will depend on the chosen access path of course.

When writing to an IFS write fewer larger chunks of data rather than multiple small chunks. Overhead is opening, positioning and closing the IFS file.

Pass reference data down through call stack rather than re-retrieve in the lower level function.

Consider physical file access paths for fix programs (version 7.0+) or write SQL to perform the basic updates.

Use OS/400 default values to initialise fields on a database file rather than write a program.

CHG/CRT v CRT/CHG. Use the appropriate one depending on likelihood of the records existence.

Avoid *CONCAT and *SUBSTRING native in 2e for long string manipulation. If concatenating for long strings it is possible to keep a counter of where you are to save the concatenation operation time to identify the current position in the string.

Avoid RTV message to build strings with high usage.

Consider DSPFIL instead of DSPTRN, especially true if de-normalisation is designed in the database with any total duplicated into the header record.

Do not perform a *RTVCND for blanks.
Check for blank first in the action diagram.

Consider a database file field for *RTVCND if approriate.

Be aware of the affect of a scan limit for strict selection criteria as the screen will not pause load processing until the subfile if full or EOF reached. Particularly for large files.

Consider the naming conventions of your access paths to ensure that underlying indexes can be shared when key subsets are apparent and ensure that are built and implemented in the correct order to reduce indexes.

Ensure access paths have correct maintenance option i.e. *IMMED, *DLY or *REBLD.


Thanks for reading.
Lee.