File:  [LON-CAPA] / loncom / localize / localize / checkduplicates.pl
Revision 1.2: download - view: text, annotated - select for diffs
Wed Apr 8 15:10:22 2009 UTC (15 years, 1 month ago) by bisitz
Branches: MAIN
CVS tags: version_2_9_99_0, version_2_12_X, version_2_11_X, version_2_11_4_uiuc, version_2_11_4_msu, version_2_11_4, version_2_11_3_uiuc, version_2_11_3_msu, version_2_11_3, version_2_11_2_uiuc, version_2_11_2_msu, version_2_11_2_educog, version_2_11_2, version_2_11_1, version_2_11_0_RC3, version_2_11_0_RC2, version_2_11_0_RC1, version_2_11_0, version_2_10_X, version_2_10_1, version_2_10_0_RC2, version_2_10_0_RC1, version_2_10_0, loncapaMITrelate_1, language_hyphenation_merge, language_hyphenation, bz6209-base, bz6209, bz5969, bz2851, PRINT_INCOMPLETE_base, PRINT_INCOMPLETE, HEAD, GCI_3, BZ5971-printing-apage, BZ5434-fox, BZ4492-merge, BZ4492-feature_horizontal_radioresponse
Heavily optimized version how to search for duplicate keys:
- Read translation file only once and directly count key occurrences
  (inclusion of lexicon hash not needed anymore; Thanks to Stefan Droeschler for the idea)
- More flexible key matching pattern
  (leading white spaces)
- Optimized key matching pattern (quotes)
- Now also print amount of each duplicate key

#!/usr/bin/perl
# The LearningOnline Network with CAPA
# $Id: checkduplicates.pl,v 1.2 2009/04/08 15:10:22 bisitz Exp $

# 07.04.2009 Stefan Bisitz
# Optimization ideas by Stefan Droeschler

use strict;
use warnings;

my $man = "
checkduplicates - Checks if hash keys in translation files occur more than one time. If so, a warning is displayed.

The found keys and corresponding values need to be changed. Otherwise, there is no gurantee which value is taken. This is dangerous, if same keys but different values are used or if one value is changed but the screen still shows the old value which actually comes from the other occurence.


SYNOPSIS:\tcheckduplicates -h 
\t\tcheckduplicates FILE

OPTIONS:
-h\t\tDisplay this help and exit.

";

my $filename; 
die "Use option -h for help.\n" unless exists $ARGV[0];
#analyze options
if ( $ARGV[0] =~ m/^\s*-h/ ) {
	print $man;
	exit();
}else{
	$filename = ($ARGV[0]);
	die "$filename is not a file.\n" unless -f $ARGV[0];
}


# ----------------------------------------------------------------
# Start Analysis
print "checkduplicates is searching for duplicates in $filename...\n";

# Manually read all stored keys from translation file (inlcuding probable duplicates)
# and count key occurrences in a separate hash.
my %counter;
my $line;
open( FH, "<", $filename ) or die "$filename cannot be opened\n";
while ( !eof(FH) ) {
    $line = readline(FH);
    next if $line=~/^\s*#/; # ignore comments
    #$exprNP=~s/^["'](.*)["']$/$1/; # Remove " and ' at beginning and end
    if ($line =~ m/^\s+["'](.*)["']/) { # Find "..." or '...' key
        $counter{$1}++;
    }
}
close(FH);

# Print all keys which occures more than one time
my $dupl = 0; # total counter to count when a key occurred more than one time
foreach my $count_key (keys %counter) {
    my $count_value = $counter{$count_key};
    if ($count_value > 1) {
        print 'Found '.$count_value.' times key: '.$count_key."\n";
        $dupl++;
    }
}

if ($dupl == 0) {
    print "Be happy - No duplicates found.\n";
} else {
    print "--- Found $dupl duplicate(s) in $filename which need to be corrected!\n";
}

# ----------------------------------------------------------------


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>