Wednesday, July 3, 2019

Enhancement for 2E (Come on Broadcom.....)

I've been having a think about a few enhancements for 2E in the hope that the new owners of the products will invest in them. Leaving aside the roadmap ideas and up voted ideas on the idea wall (which need acting upon, especially REST API support), here are a few other ideas for consideration.

Enhancement One

Add an option in Display All Functions to get at the details screen for a given function to show the model object display screen.  Currently you have to do a Usage (wait…) and the do an 8 next to the 000 level function.


I use this screen to see what list an object is associated with etc.


Enhancement Two

The Open Functions screen is really handy but it would be more useful if we can search for a function via its CPF Name.

Enhancement Three

Action Diagram templates to display (fully) what code they are generating.  A good example of this is the RTVOBJ where the return code is set to *Normal at initialisation and automatically set to *Record does not exist in ‘Record Not Found’ user point.  I would be good if this was shown.

There are other examples where the templates could be expanded to show the exact processing to be generated.


I've seen so much bad coding where people are initialising these values not realising what the template generates.  As 2E is supposed to insulate you from the generated code it would make sense to show in AD what actually occurs.... wouldn't it....

Perhaps an expert mode like F21 (WRKACTJOB) to determine how much detail is shown.  I do honestly believe there is a lot of bad code out there due to template knowledge.

Enhancement Four

SQL Statement Support.  Whilst this is already possible for embedded EXCUSRSRC etc, I was thinking more of an interface to do generic SQL processing but use to the 2E method of declaring files and fields etc, have the added bonus of Impact Analysis.

Ability to define an SQL interface using 2E files/Fields and to execute the SQL statement.  This will show up in usages for the files and fields as *SQLSTM


Enhancement Five

License information.  Sure there is a command to show licensing of the product on the box (YDSPLICPRD), but I use it so infrequently I have to look it up almost every time I need it.  Perhaps the model details screens can be extended to have a command key to show the model licensing details.

So come on Broadcom.  Shape your product, prove that you still care.

Thanks for reading.
Lee.

Saturday, June 1, 2019

Slow running batch tasks - A simple method to get you started

Edit - 27/04/2023 - Tidied up SQL formatting and added a note around multi-member files.

"The EOD job is taking too long!", says every system administrator ever!!!!


Tip: First of all if you are reading this please benchmark the program(s) and do one adjustment each time to determine what is making the difference for your situation.  Don't be tempted into making too many changes at once as you'll never learn the value of each approach.  Overtime you will learn which ones make the most difference for your system workload.  Above all, you must remember to re-baseline after each set of changes, as not all changes are equal, some may actually slow down your programs.
There are numerous strategies for improving overall batch performance.  Typically these would included.

  1. Reducing unnecessary random IO - Use Arrays (or memory) instead of disk IO.
  2. Ensure files are opened and closed once per job execution (if possible).
  3. Try to eliminate multiple passes of the same data set.
  4. Breaking the jobs up to perform threaded processing. Comes with a warning!
  5. Remove journaling overheads (if possible).
  6. Reduce record lock contention.
  7. Place independent jobs in parallel in one subsystem. 
  8. Hardware upgrade (CPU, Disk, Memory) etc.
  9. Distributing processing to after a high intensity window (deferred processing)
This is all well and good but how can you get at the information to assist you with identifying where your programmatic problems are.  Many batch processes can nest into dozens of layers deep (both functions and physical program objects).

Obviously there are tools on the IBMi to assist greatly, some are licensed and others are provided by third parties.  I am going to assume you are reading this and you are not yet ready for performance monitoring or job tracing but just have some general batch performance issues and require some quick wins.

Some developers have the skills to just look at a programs architecture and make some compelling changes but most require some hard evidence.  Even if you are one of these programmers with a great deal of insight for your system I'd suggest you do the baseline below to measure your improvements.

To get started I tend to query the database member statistics pre and post execution for the program(s) to determine what occurred.  Note: It is best to capture this data onsite (production) and slot wrap the following commands around the batch program(s).  If you want to reduce noise (data interference) then end as many jobs and subsystems as relevant so that only the IO of the job at hand is being 
captured.

Please note setup commands and queries are highlighted in pink and queries to analyse the results are highlighted in blue.

Capture before details of file(s) IO

SBMJOB CMD(DSPFD FILE(LIBRARY/*ALL) TYPE(*MBR) OUTPUT(*OUTFILE) FILEATR(*PF) OUTFILE(LEED/BF_DTA)) JOB(BF_DTA) 

Execute the batch tasks/programs in question and then capture the after details of the file(s) IO.

SBMJOB CMD(DSPFD FILE(LIBRARY/*ALL) TYPE(*MBR) OUTPUT(*OUTFILE) FILEATR(*PF) OUTFILE(LEED/AF_DTA)) JOB(AF_DTA)                                       

After this you will have two files that you can compare the before and after scenario.  This can give you  insight into what database activity occurred whilst your program/job (set of jobs) were running.  In the example below the SQL refers to the files as BF_DTA and AF_DTA (both in library LEED).  You will need to change these accordingly.

SELECT T01.MBLIB, T01.MBFILE, T02.MBNRCD, T02.MBOPOP, T02.MBCLOP, T02.MBWROP, T02.MBUPOP, T02.MBDLOP,     
T02.MBLRDS, T02.MBOPOP-T01.MBOPOP AS DIFF_OPEN, T02.MBCLOP-T01.MBCLOP AS DIFF_CLOSE, T02.MBWROP-T01.MBWROP AS DIFF_CRT, T02.MBUPOP-T01.MBUPOP AS DIFF_UPD, T02.MBDLOP-T01.MBDLOP AS DIFF_DLT, T02.MBLRDS-T01.MBLRDS AS DIFF_READ, (T02.MBWROP-T01.MBWROP)+(T02.MBUPOP-T01.MBUPOP)+(T02.MBDLOP-T01.MBDLOP) AS DIFF_CUD, (T02.MBUPOP-T01.MBUPOP)/NULLIF(T01.MBNRCD,0) AS IOINTENSE
FROM LEED/BF_DTA T01 INNER JOIN LEED/AF_DTA T02 ON T01.MBFILE = T02.MBFILE AND T01.MBLIB = T02.MBLIB

Note if you have multi members for a given file then the query above should have the WHERE clause extended by focusing on the main physical file member only.  This helps avoid a many to many join scenario.  In my latest environment I append the following.  You can also tune your query to simply omit certain files also.

AND T01.MBFILE = T01.MBNAME

The output is a comparison by file showing the IO differences i.e. Reads, Updates, Opens, Closes etc.

To output this to a file wrap the SQL statement above with the following...

CREATE TABLE LEED/DIFF AS (
 
INSERT SQL STATEMENT ABOVE HERE!!!
 
) WITH DATA

Again, replace LEED with a library of your choice.

Please note that the target file shouldn't exist already and that a library and file name is your choice and will impact the queries below.

This raw data should be enough for you to highlight any performance bottlenecks.  

Additional Queries

As each environment is different here are a few SQL's to execute over the differences file to provide some pointers.


The queries below will help to identify certain database performance scenarios.  I have highlighted the recommended editable values.

High IO Count

SELECT MBLIB, MBFILE, DIFF_CUD FROM LEED/DIFF WHERE diff_cud > 100 ORDER BY diff_cud desc     

Review and see if the IO is commensurate with the number of accounts or clients (records)  being processed etc.  if not, you may have duplication and refactoring could help.

High IO and Triggers

SELECT t01.MBLIB, t01.mBFILE, DIFF_upd FROM LEED/DIFF t01 inner join ytrgctlp t02 on T01.MBFILE = T02.TRGFIL WHERE diff_upd > 100 AND T02.TRGEVT = 'U' and T02.CMTLVL = 1 ORDER BY diff_upd desc       

This will highlight any files with high IO that also have Synon triggers.  Excessive volume may lead to increases in runtime.  Perhaps you have changed objects that are updating records which haven't changed....  Null Update suppression may work here. 

Excessive Reads (Arrays, *QUIT required, Join Logicals)

SELECT MBLIB, MBFILE, DIFF_read, MBNRCD FROM LEED/DIFF WHERE diff_read > 1000 ORDER BY diff_read desc                     

The 1000 (example) figure is very low.   Typically I would be looking for numbers in the millions for a good sized client. 

Excessive Reads for low record count files (Possibility to move to arrays)

SELECT MBLIB, MBFILE, DIFF_read, MBNRCD FROM LEED/DIFF WHERE diff_read > 1000 and mbnrcd < 100 ORDER BY diff_read desc  

Again, review these numbers based on the client database. If you are constantly reading from the same file then these could be committed to memory (array), moved to SSD, loaded in memory etc.

High UPDATES for low volume files (indicates potential contention i.e. a surrogate etc)

SELECT MBLIB, MBFILE, DIFF_upd, MBNRCD FROM LEED/DIFF WHERE diff_upd > 1000 and mbnrcd < 100 ORDER BY diff_upd  desc  

Note: Often people have a surrogate file to get next value for a key.  Especially if you are running parallel processing (either multiple jobs or parallel jobs over one dataset) the parallel jobs can cause record lock contention.

High File Open/Close

SELECT MBLIB, MBFILE, DIFF_open FROM LEED/DIFF WHERE diff_open > 10 ORDER BY diff_open desc     
          
This is used to identify if the task/program has excessive close downs.  Perhaps a routine is set to Closedown = 'Y'.  It is inefficient to keep opening and closing files.  Check the Synon function options within your call stack.

I hope that this information is useful and motivates you to finally have the confidence to look at that long running job.  Using some of these techniques above I have had significant performance improvements.  It is the true IO data that is a reflection of your code and for that you need to use tools or mine the data for yourself.

This, I promise, is a good starting point and as always, I'm happy to help.

Thanks for reading.
Lee.

Wednesday, May 15, 2019

Comment on commenting.

Hello,

Comments are an essential part of any coding practice whether you are using traditional languages that are quite verbose with their syntax and vocabulary i.e. Java, C# or RPG/COBOL.  Even code generation environments like 2E and Plex benefit hugely from appropriate commenting.

Modern low-code platforms like Appian, Mendix and Outsystems (to name a few) who shield you from code (as much as possible) benefit from correctly named functions and comments/annotation within them.

Without comments, what was as relatively simple coding process to the creator is now a moderate pain in the butt for the developer maintaining your code.  Multiply that with a complicated piece of technical logic and/or business logic which is now practically impossible for a maintenance developer to pick up and be successful.

Chances are you will NOT be maintaining your code. Get this into your heads.....

To avoid this, structure your comments professionally and ensure that the comment adds value.

Commenting out old code for safety reasons in the modern world is simply unacceptable.  With repositories like GitHub etc you can be brave and make changes.  Sure, comment some stuff out locally whilst trialing a few ideas....I get it.    But to commit that code to the main branch or the model if programming in Plex/2E is just unforgivable.

If you have got to the point where you have unit tested your code and are 1000% happy, remove the commented out code....NOW.

I'd also go as far to say that you should remove all legacy commented out code at the time you checkout the function...I mean where others have failed before you.  

There are no excuses for leaving commented out code in a production object/branch.

Thanks for reading.
Lee.