Charles Babbage wrote 1837:

"What is the smallest positive integer 
whose square ends in the digits 269,696?"

A solution within the features of the Analytical Engine (AE) would be appropriate, so there is no string processing and no floating-point numbers, no built-in square-root, and the remainder of a division used to determine the trailing digits. The (integral or fixed-point) numbers were rather long, 30 decimal digits. We assume the execution time for an addition to be 1 second, and for a division as well as a multiplication 1 second per digit (excluding leading zeroes). Just to show the order of time, 10 seconds will be assumed for each multiplication and division. (Maybe the division by a power of ten was much quicker.)

The simple but reliable method to probe all intergers (manually entered start point, e.g. 500 for the above example), about 20000 multiplications and the same number of divisions were needed, running about 400,000 seconds, or 111 hours or nearly 5 days. Babbage surely would have never considered to use his precious machine this way.

Instead of enumerating the squares by multiplication, the binominal formula x² = (x-1)² + 2x - 1 could be used to enumerate the squares. Now we have 3 seconds to provide the next square, 10 seconds for the remainder, and 1 second to test the result, i.e. 14*20,000 seconds or 78 hours or 3 days, still too long. (In modern computers, where the multiplication is not slower than an addition, this version will even be longer, as more instructions are performed).

It is well known that any square number can only end in the digits 0, 1, 4, 5, 6 and 9; so the problem is solvable, as it ends in 6. To produce the end digit 6, only numbers with the end digits 4 and 6 need to be considered. Thus, when enumerating squares, we can step by alternatingly by 2 (_4 to _6) and 8 (_6 to _4), which reduces the time to 4000 rounds with 14 seconds each, i.e. 56,000 seconds or 16 hours. The same for the last two digits _96 gives _14, _64, _36 and _96 nees 800 rounds, 11,200 seconds or 3 hours.

Still Babbage as a mathematicion would not use such a raw attempt; but instead try to calculate the set of permissible set of endings: The inital set for 10 as a modulus and 6 as ending digit is {4, 6}. Now the sequences 04, 14, 24, … and 06, 16, 26, … squared and probed for the ending '96', giving the set {14 36 64 86}. The next step adds multiplies of 100 to the numbers of the set, and checks the squares for the ending '696', giving the set {264 236 736 764}. Two more sets, {264 2236 2764 4736 5264 7236 7764 9736} and {264 24736 25264 49736 50264 74736 75264 99736} are determined, where the last one contains two solutions, 25264 and 99736. As the sets have together 26 elements, for each 10 probes have to be done, i.e. 260 divisions. But as various other operations are required, we should estimate 20 seconds for each test, thus 5200 seconds or nearly 2 hours.

An experienced human computer at that time can compare the trailing digits very quickly, so the “division” needs just one second. Moreover, the squares of 4, 14, 24, … are not obtained by multiplication, but the binominal formula:

(x + 10)² = x² + 2*10*x + 100

As only the result modulo 100 is required, only twice the last digit has to be added to the first digit, modulo 100. The addend is 8 for a trailing 4, and 2 for a trailing 6. For the number above, the pattern is in the first digits is recurring, i.e. 1, 9, 7, 5, 3 for the first digit, if 4 is the last digit, and 1, 3, 5, 7, 9 if 6 is the last digit. So it is only left to count the distance, to obtain two new set elements, and if the first square has the wrong ending (i.e. 36² = 296) and an even first digit, the whole series can be skipped. So one test would require not more than 5 seconds, giving 1300 seconds or about 20 minutes or less than half an hour, which I can confirm. (The setup not counted, as that has to be done to program a coumputer anyhow. And the situation for other numbers may not be equally advantageous.)

Note that for the number 971041, the set method needs about the same time, but as the result is 126929 and most numbers have more digits, the machine would require 20*100,000 seconds or nearly 600 hours or 23 days.

The major advantage of the set method is that is terminates if there is no result, while the others will run for ever.

There is an infinite number of solutions. If 'x' is a solution (m is the smallest power of 10 with m<x):

x² mod m = e 

Then:

(x + 5m)² = x² + 10m x + 25m² = x² mod m

and for all z:

(x + z * 10m)² = x² + 20zm + 100m² = x² mod m

There may be more solutions, e.g. for the above example, 150264 is another solution not generated as shown.

Other sample input numbers (all require more than 32 bits for the square):

123456	(nice number)
971041	(largest result found so far with 6 digits)
  3716	(for a 16 bit computer)

Some more information are in an example programme (written in TAV, see https://tavpl.de):

\(
	Charles Babbage wrote 1837:
		"What is the smallest positive integer 
		whose square ends in the digits 269,696?"

	A solution within the features of the Analytical Engine (AE)
	would be appropriate, so there is no string processing
	and no floating-point numbers, no built-in square-root,
	and the remainder of a division used to determine the trailing digits.
	The (integral or fixed-point) numbers were rather long, 30 decimal digits.
	We assume the execution time for an addition to be 1 second,
	and for a division as well as a multiplication 1 second per digit
	(excluding leading zeroes). Just to show the order of time,
	10 seconds will be assumed for each multiplication and division.
	(Maybe the division by a power of ten was much quicker.)

	The simple but reliable method to probe all intergers (manually 
	entered start point, e.g. 500 for the above example),
	about 20000 multiplications and the same number of divisions were needed,
	running about 400,000 seconds, or 111 hours or nearly 5 days. 
	Babbage surely would have never considered to use his precious machine
	this way.

	Instead of enumerating the squares by multiplication, the binominal
	formula x² = (x-1)² + 2x - 1 could be used to enumerate the squares.
	Now we have 3 seconds to provide the next square, 10 seconds for 
	the remainder, and 1 second to test the result,
	i.e. 14*20,000 seconds or 78 hours or 3 days, still too long.
	(In modern computers, where the multiplication is not slower than an addition,
	this version will even be longer, as more instructions are performed).

	It is well known that any square number can only end in the
	digits 0, 1, 4, 5, 6 and 9; so the problem is solvable, as it ends in 6.
	To produce the end digit 6, only numbers with the end digits 4 and 6
	need to be considered. 
	Thus, when enumerating squares, we can step by alternatingly
	by 2 (_4 to _6) and 8 (_6 to _4), which reduces the time to
	4000 rounds with 14 seconds each, i.e. 56,000 seconds or 16 hours.
	The same for the last two digits _96 gives _14, _64, _36 and _96 
	nees 800 rounds, 11,200 seconds or 3 hours.

	Still Babbage as a mathematicion would not use such a raw attempt;
	but instead try to calculate the set of permissible set of endings:
	The inital set for 10 as a modulus and 6 as ending digit is {4, 6}.
	Now the sequences 04, 14, 24, ... and 06, 16, 26, ... squared
	and probed for the ending '96', giving the set {14 36 64 86}.
	The next step adds multiplies of 100 to the numbers of the set,
	and checks the squares for the ending '696', giving the set 
	{264 236 736 764}. Two more sets, 
	{264 2236 2764 4736 5264 7236 7764 9736} and 
	{264 24736 25264 49736 50264 74736 75264 99736} are determined,
	where the last one contains two solutions, 25264 and  99736. 
	As the sets have together 26 elements, for each 10 probes have to 
	be done, i.e. 260 divisions. But as various other operations
	are required, we should estimate 20 seconds for each test, 
	thus 5200 seconds or nearly 2 hours.
	
	An experienced human computer at that time can compare the trailing
	digits very quickly, so the "division" needs just one second.
	Moreover, the squares of 4, 14, 24, ... are not obtained by
	multiplication, but the binominal formula:
		(x + 10)² = x² + 2*10*x + 100
	As only the result modulo 100 is required, only twice the last digit
	has to be added to the first digit, modulo 100.
	The addend is 8 for a trailing 4, and 2 for a trailing 6. 
	For the number above, the pattern is in the first digits is recurring,
	i.e. 1, 9, 7, 5, 3 for the first digit, if 4 is the last digit,
	and  1, 3, 5, 7, 9 if 6 is the last digit.
	So it is only left to count the distance, to obtain two new set elements,
	and if the first square has the wrong ending (i.e. 36² = 296) and 
	an even first digit, the whole series can be skipped.
	So one test would require not more than 5 seconds, giving 1300 seconds
	or about 20 minutes or less than half an hour, which I can confirm.	
	(The setup not counted, as that has to be done to program a
	coumputer anyhow. And the situation for other numbers may not be
	equally advantageous.)
	
	Note that for the number '971041', the set method needs about
	the same time, but as the result is 126929 and most numbers
	have more digits, the machine would require 20*100,000 seconds
	or nearly 600 hours or 23 days. 

	The major advantage of the set method is that is terminates
	if there is no result, while the others will run for ever. 	

	There is an infinite number of solutions.
	If 'x' is a solution (m is the smallest power of 10 with m<x):
		x² mod m = e 
	Then:
		(x + 5m)² = x² + 10m x + 25m² = x² mod m
	and for all z:
		(x + z * 10m)² = x² + 20zm + 100m² = x² mod m
	There may be more solutions, e.g. for the above example, 150264
	is another solution not generated as shown.


	Other sample input numbers (all require more than 32 bits for the square):
		123456	(nice number)
		971041	(largest result found so far with 6 digits)
		  3716	(for a 16 bit computer)

	See also:
		http://rosettacode.org/wiki/Babbage_problem
		http://mathworld.wolfram.com/SquareNumber.html
	
	Options:
		-d	Debug
		-v  verbose, show sets
		-t 	show times

\)

main(args):+
	opts =: row args parse options values ()
	
	? opts{'d'} 
		debug enable
		system info enable
	? opts{'v'} 
		debug enable				\ use better solution

	\ supply default argument(s) if none given
	? args.count = 1		\ no arguments
		args[1] =: "269696"
		args[2] =: "971041"
		args[3] =: "123456"
		args[4] =: "3716"

	\ no scan, args[0] is program name
	row args compact
	?# i =: from 1 upto args.count-1
		determine square ending in args[i] showtimes opts{'t'}
		print ''


										  
\( 	Main processing function for a single ending
	Ending can be a string with leading zeros
\)
determine square ending in (ed) showtimes (showtimes):  
	\ determine modulus as m =: 10^(ed.count)
	m =: 1
	?# i =: from 1 upto ed.count
		m =* 10
	e =: string ed as integer base 10   \ leading zeroes are not octal

	\ check last digit
	ld =: e %% 10
	? ld = 2 | ld = 3 | ld = 7 | ld = 8
		error print "Invalid last digit: " _ ld 
		:> 

	\ calculate sets of endings increasingly
	t =: system cpu seconds		
	xx =: use sets of endings e mod m
	? showtimes
		tt =: ((system cpu seconds) - t) * 1000.0
		tt =: format '%9.1f;' tt
	? xx = ()
		print "No solution"
		:>

	tail =: tt _ " (set method)"
	?# x =: map xx give keys ascending
		x2 =: x * x
		print x __  (split x2 at ed.count) __ tail
		tail =: ''
		?> 				\ print only the first one


	\ Simple, yet efficient only on modern machines: count and square
	t =: system cpu seconds	
	x =: count and square find e mod m
	? showtimes
		t =: ((system cpu seconds) - t) * 1000.0
		tt =: format '%9.1f;' t
	x2 =: x * x
	\ steps =: x - math int sqrt e
	print x __  (split x2 at ed.count) __ tt _ " (count and square)"

	\ Enumerate squares
	t =: system cpu seconds	
	x =: enumerate squares upto e mod m
	? showtimes
		tt =: ((system cpu seconds) - t) * 1000.0
		tt =: format '%9.1f;'  tt
	x2 =: x * x
	print x __  (split x2 at ed.count) __ tt _ " (enum squares linearily)"

	\ Enumerate only squares ending in 4 or 6
	t =: system cpu seconds	
	x =: enumerate squares sparse upto e mod m
	? showtimes
		tt =: ((system cpu seconds) - t) * 1000.0
		tt =: format '%9.1f;' tt	
	x2 =: x * x
	print x __ (split x2 at ed.count)  __ tt _ " (enum multiples of 4 and 6)"


split (x) at (l):
	r =: '' _ x				\ convert to string
	h =: string r upto r.count - l
	t =: string r from -l

	:> h _ '_' _ t


\	Simple method: Count and square
\	================================================
count and square find (ed) mod (m):
	x =: integer ed sqrt
	?* (x*x) %% m ~= ed
		x =+ 1
	:> x
	

\	Still simple: Enumerate squares
\	================================================

\(	Enumerate squares starting below the square root 
	of a limit, until the limit is reached,
	using the binomial formula
		x² = (x-1)² + 2x - 1
\) 
	
enumerate squares upto (ed) mod (m):
	x =: integer ed sqrt
	y =: x * x
	?* y %% m ~= ed
		x =+ 1
		y =+ 2*x - 1
	:> x
	

\	Enumerate sparse
\	=======================================

\(	Enumerate squares starting below the square root 
	of a limit, until the limit is reached.
	Depending on the end digit, the end digits are
	0	0
	1	1,9
	4	2,8
	5	5
	6	4,6
	9	3,7

	Stepping is in increments of 10 using the binomial formula
		x² = (x-10)² + 20x - 100
\) 

enumerate squares sparse upto (ed) mod (m):
	x =: integer ed sqrt
	x =: 10 * (x // 10)		\ must be multiple of 10

	ld =: ed %% 10			\ last digit
	xa =: 0
	xb =: 0
	? ld = 0
		xa =: x
	? ld = 1
		xa =: x + 1
		xb =: x + 9
	? ld = 4
		xa =: x + 2
		xb =: x + 8
	? ld = 5
		xa =: x + 5
	? ld = 6
		xa =: x + 4
		xb =: x + 6
	? ld = 9
		xa =: x + 3
		xb =: x + 7
	! xa ~= 0

	ya =: xa * xa
	yb =: xb * xb
	?* 
		? ya %% m = ed 
			:> xa
		xa =+ 10
		ya =+ 20*xa - 100
		? xb = 0
			?^
		? yb %% m = ed 
			:> xb
		xb =+ 10
		yb =+ 20*xb - 100

	


\	Advanced: determine sets of endings
\	==================================

\ from the old set, the new set is calculated
\ by squaring each number in the old set, plus a multiple of inc
\ 
next set from (old) step (inc) find (ed) mod (m):
	m =: 10*inc						\ truncate
	e =: ed %% m					\ this ending
	new =: {}
	?# x =: give keys old
		\ multiplication used; linear advance is too complex here
		?# y =: from x upto (x+m-1) step inc
			z =: y * y
			z =: z %% m
			\- print ?x __ ?i __ ?y __ ?z __ ?e __ ?m
			? z = e
				new{y} =: y
	:> new
		

show set (x):
	\ print x.count _ ':' nonl
	?# i =: map x give keys ascending
		print i _ ' ' nonl
	print ''

\ returns set of result
use sets of endings (e) mod (m):
	res =: {}
	set =: {}
	set{0} =: 0
	
	\ 
	i =: 1
	?*  i < m
		set =: next set from set step i find e mod m
		? debug is enabled
			show set set
		? set.count = 0
			:> ()					\ no solution
		i =* 10
	
	\ check if result is in final set
	?# y =: give keys set
		z =: y * y
		z =: z %% m
		? e = z 
			res{y} =: y
	:> res

 
\ enumerate powers of 10
powers of (base) upto (lim):
	? $ = ()
		$ =: 1
	|
		$ =* base
	? $ > lim
		$ =: ()
	:> $

\	Helper Functions
\	================

\ Determine smallest power of ten larger than given number
\ As to round up the decimal logarithm is limited in precision,
\ use simple loop, which may even be quicker for small numbers.
xpower of ten above (x):
	p =: 1
	?* p <= x
		p =* 10
	:> p


\+ stdlib