Dr. Bicky Marquez

Ph.D. in Applied Physics

I design photonic hardware for Artificial Intelligence applications.

Love philosophy & oil painting.

The language of analog machines

Analog machines are defined as physical systems whose internal variables evolve according to the entities that they represent. According to Corey J. Maley, analog systems are built with variables in which : “the variation in the variable under consideration makes a difference in what is being represented.” (Maley 2018), therefore it seems that there is also a primitive form of language used to encode information in analog processors. This encoding process aims to map external information into the variables space of the analog machine - where such information is processed. Once this step is fulfilled, we should have access to the results by decoding them. The information is translated to the language of the analog machine first, and then translated back to the original language of the source after being processed. At this point, we may wonder what makes a system better at solving a task more efficiently than others if, after all, both analog and digital machines require an encoding-decoding model of communication to process data and find solutions. Why, for instance, should we consider analog devices at all if digital language allows us to tackle most problems that we can imagine? Efficiency when solving those problems may be a partial answer. The complexity of the code makes the big difference in this matter. In digital computing, a system might need more bits of precision to perform better for machine learning applications, but it makes the process more inefficient: slow and computationally costly. On the other hand, less bits would make the processing step faster and less costly but more inaccurate. Compared to their digital counterparts, analog computers use less complex systems of symbols and arguments embedded in their structures to represent the same information.

A set of symbols and constraints characterize a primitive language that can be implemented in an analog machine. For information processing, some languages should be better than others to process specific types of data. From our experience, basic mathematical operations are better performed by complex and accurate languages. Accuracy is the only possibility to consider when performing basic mathematical operations. The results from those operations can never be associated to a degree of confirmation - they should be accurate. It is unacceptable to have a result such as “four plus four is equal to eight” with 98% accuracy; instead we expect a result equal to “eight” with 100% accuracy any time. For some reasons, we accept degrees of confirmation for many other tasks results that are just as important. Accuracy is the Holy Grail that everyone would like to have on their side at any given time. For some datasets this aim might be difficult or even impossible to achieve, e.g., information processing of big continuous data is a challenging task to do. Usually, analog information processing is highly convenient when working with information that has a continuous-time form, since no information is dismissed. Specifically for time series prediction, the lack of continuity when processing past states of a continuous-time system could lead to prediction flaws. As an example, it is well known that continuous-time chaotic time series prediction is usually performed using digital machines. As it is not physically possible to store truly continuous sequences in a digital machine, the set of past states that we use to train and build a predictor is therefore limited. Researchers must look for methods with which to generate more samples in between those known past states to artificially increase the resolution of the input sequence, and then approximate the data towards its continuous original form (Weigend and Gershenfeld 1993). If we could work directly with systems able to handle continuous-time variables, it would be possible to improve the performance of our predictors.

Indeed, analog computers are designed to efficiently simulate physical systems in both discrete and continuous forms. As exemplified above, analog machines are especially convenient when working with continuous systems. Moreover, complex function approximation also harnesses the various advantages of working with analog architectures. One of the most powerful function approximators is the neural network model in its two forms: biological (the brain) and artificial. The lack of accuracy of neural networks compensates for the time and energy spent solving a complex task. However, using digital computers and algorithms as a base for neural networks design is a step that has to be surpassed since it is still inefficient. The most pressing bottleneck for AI is now processing power. The amount of compute required to train state-of-the-art AI has increased exponentially over the last six years with a doubling period of 3.5 months (Amodei and Hernandez 2018). Electronic architectures face fundamental limits; Moore’s law is ending, and transistors are projected to stop shrinking by around 2021 (ITRS 2015). Moving data electronically on metal wires is intrinsically bandwidth-limited (Miller 2009) and energy inefficient, so electronic deep learning hardware accelerators are therefore often starved of input data (Chen et al. 2017).

Recently, photonics (i.e., light-based) technology is being investigated as a way to accelerate information processing and reduce power consumption. Silicon photonic waveguides bus data at the speed of light. The challenge is to reduce or control the associated noise that analog devices have. However, the fact that we can easily embed the variables that model such functions directly in analog devices is a huge advantage since it partially resolves the problem of computational complexity (Bergadano 1991). Using analog machines, we can make sure that solving a task would be performed efficiently with respect to digital computers. Computational complexity reduction is here associated with language complexity minimization where the encoding-decoding process uses less symbols and arguments to represent the same information.

a. Communication and processing

In any case, either analog or digital encoding-decoding processes might carry many unwanted distortions on the data. For instance, a digitalized picture converts an image in a series of bits that recreate a digital copy of that image. The image would be more accurately described if the number of bits increases. If we transmit this information to other users, then the quality of the communication between sender and receiver may be affected by the number of bits. When using this data for digital image processing with machine learning, the results could be affected by the nature of the distorted data. According to Wittgenstein’s picture theory: “the essential features of what is being represented does not correspond with the original picture” (Stern 1995), digital language triggers a conflict for effective communication between the source and the processor. Hence, without any slightly correct representation of the features composing the original data, it is hard to build arguments that can connect the targets/conclusions with the available premises/features. An accurate communication between source and processor would be enhanced by an increase in the number of bits. With such encoding needs, we would require more sophisticated machines to work; for example, with 64 bits or more (in our era), instead of 32 bits and less. This last resource carries the problem again to the field of high computational cost and machine power to keep those aspirations up. In fact, in the absence of analog machines that can replace digital computing, researchers in computer sciences are designing algorithms that can provide some partial relief to big data processing. In particular, they work with low precision neural networks, reducing the number of floating points of the problem for image and speech classification (Gallus and Nannarelli 2018). This optimization allows researchers to reduce time, memory and computational power demand when classifying images and speech. 

Furthermore, the encoding-decoding process finds many other challenges unconsidered before when used for hybrid problems such as those we can find in psychological and/or social circumstances. As efficient living information processing systems, human beings face the issue of information translation or communication from a basic space (related to efficient processing) to more complex contexts (the ultimate output space). For instance, in international social situations, people from a certain community have to explain the result of their internal processes to other communities with high accuracy. The information has to flow all the way from the most primitive processor, embedded in a person’s body in a particular context, to a hybrid output space where it has to interact with networks of human-embedded processors within a different context. This flow brings out the translation of the whole process of our decision-making path in order to justify to others all details related to the achieved result. We translate our methodologies, rules and outputs to people in our community first and then to others. In this way, many parts of such information could be distorted as people try to interpret information according to their own internal mechanisms and references. Consequently, society typically translates information with large uncertainty. Hybrid information processing engines therefore introduce high uncertainties which triggers conflicts for effective communication.

Such sources of uncertainty can be explained through the formation of prediction regions. These regions depend largely on prior information and sample sets that humans stored throughout their lives. In addition, humans’ prediction regions also depend on their beliefs and reasoning, which are supposed to complement the sample set that they use to train their predictive machines. These regions seem to be non-static in humans; they are dynamically changing with the internal and environmental conditions. Prediction regions allow us to accurately make decisions in real life situations. Decision making in dynamic environments are therefore obtained from real samples and beliefs. Such beliefs do not qualify as noise; they could be interpreted as artificial samples that are in fact types of extrapolations obtained from real data (Nilsson 2014). Therefore, the incorporation of beliefs in our predictors seem to be related to the action of building a predictor based on how things have to be according to a person or community. Clearly, such systems will most probably output highly uncertain results. High uncertainty is translated in a predicted result with lower probability to be true, but still is useful for a person’s dynamic daily decision-making. Dynamic predictive regions are therefore based on artificial samples or mental extrapolations, as well as real data. Such a dynamic prediction region cannot easily be communicated to others because usually it cannot be easily justified. The formation of such a region is being dynamically updated as new extrapolations appear.

b.  Universality

An important additional advantage of complex language-based systems over analog models is related to their universality - they can be manipulated and adapted to tackle any problem. The tasks are solved through the use of a universal language that dictates the set of instructions that are processed by the specific hardware that embeds such language. For analog systems, such instructions do not need to be specified since the system has those integrated in its architecture, like an algorithm physically represented. If we could think about fully multi-purpose analog systems, then we would need to create many specialized analog modules that help in solving many tasks. However, if we think about them as specialized objects towards one single problem, then we could expect to have infinite numbers of them just like how in real life we deal with infinite degrees of freedom interacting with us and each of them probably deserves a specialized agent. However, human beings typically do not solve or face all these problems with high accuracy due to their information processing system being not well optimized for all of them. This means that we possibly reuse some modules to solve very similar tasks which are not meant to be solved by that module by default. Such models would share properties with evolutionary psychology where analog modules are highly specialized. However, human beings cannot afford infinite modules since they will all require infinite energy to handle all degrees of freedom that surround them along their lifespans. Most likely, hybrid systems are created based on combinations of some modules that balance the time-energy constraints. Such models could be seen as hierarchies.

One of the most fascinating things about this representational-based way is to make predictions without needing rigorous reasoning based on the truth table. Issues can be faced when attempting to universalize our conclusions and methodologies. Nevertheless, it seems that there is no need for a universalization of the rules of thought when taking an automatic quotidian choice. In such a case, many of the most efficient and successful individual choices could be seen as non-proofed universals, individually helpful for plain daily living. For reasoning, it is possible to use valid inferences in representational systems to build prediction regions. Feature-like based predictive systems could perceive connections (not rigorous reasons) that are radically different to what can be reasoned by using a universal language. The fascinating side of this contrast is that through both methods (patterns or rigorous reason-based) the same conclusion could be achieved, perhaps with different degrees of confirmation. Universal language-based systems allow us to achieve more accurate results, but at the expense of energy efficiency and speed. Nevertheless, in the scientific literature of machine learning, we could find that different machine learning methods solve the same problem with different performances. It seems that there is not necessarily a unique way to solve those problems but some methods can reduce the prediction error considerably. Higher accuracy when solving a task is desired if there are not time and energy constraints which could lead us to problems of computational complexity.


References

Amodei, D. & Hernandez, D. (2018). AI and compute. OpenAI.

Bergadano, F. (1991). The Problem of Induction and Machine Learning. In Proc. Int. Joint Conf. on Artificial Intelligence, 1073–1079.

Chen, Y., Krishna, T., Emer, J. S. & Sze, V. (2017). Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits, 1(52),127-138.

Gallus, M., & Nannarelli, A. (2018). Handwritten Digit Classification using 8-bit Floating Point based Convolutional Neural Networks. DTU Compute. DTU Compute Technical Report-2018, Vol. 01.

 International Technology Roadmap for Semiconductors (ITRS) 2.0 (2015).

McCulloch, W. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys., 4(5),115-133.

Maley C.J. (2018), Toward Analog Neural Computation. Minds & Machines, (28), 77-97.

Miller, D. A. B. (2009). Device requirements for optical interconnects to silicon chips. Proc. IEEE (97), 1166–
1185.

Nilsson N. J., Understanding Beliefs. MIT Press Essential Knowledge series.

Stern, D. G. (1995). Wittgenstein on Mind and Language. Oxford: Oxford University Press.