2005
|
|
Research, while at SynopsysHardware-Software Co-Design of Embedded Reconfigurable Architectures (DAC 2000)By Y. Li, T. Callahan, E. Darnell, R. Harr, U. Kurkure, and J. Stockwood AbstractIn this paper we describe a new hardware/software partitioning approach for embedded reconfigurable architectures consisting of a general-purpose processor (CPU), a dynamically reconfigurable datapath (e.g. an FPGA), and a memory hierarchy. We have developed a framework called Nimble that automatically compiles system-level applications specified in C to executables on the target platform. A key component of this framework is a hardware/software partitioning algorithm that performs fine- grained partitioning (at loop and basic-block levels) of an application to execute on the combined CPU and datapath. The partitioning algorithm optimizes the global application execution ime, including the software and hardware execution times, communication time and datapath reconfiguration time. Experimental results on real applications show that our algorithm is effective in rapidly finding close to optimal solutions. Note: due to copyright restrictions, I cannot provide a copy of this on-line, however it is available thru ACM. Research, while at Rice as a studentLocal (Software) Cache CoherenceCache Coherence Using Local Knowledge (thesis)Abstract
Automatic Software Cache Coherence Through Vectorization (ICS '92)by Ervan Darnell, John Mellor-Crummey, and Ken KennedyAbstractOn shared-memory multiprocessors, caches for each processor must be kept consistent, i.e. have the same view of main memory. This is expensive to maintain in hardware. For machines which provide neither cache coherence nor local hardware assitance, the compiler can produce programs which are guaranteed to be coherent. This paper describes an approach relying on the notion of 'vectorzing' Cache Coherence Using Local Knowledge (Supercomputing '93)by Ervan Darnell and Ken KennedyAbstractCoherence hardware has a global nature, that is each cache must communicate with each other at run-time to maintain coherence. Strategies for effecting this suffer significant time or storage overhead. Instead, compile-time directed decisions plus some local run-time cache knowledge gathered using special hardware can achieve hit rates nearly as good as global strategies. Local strategies suffer no network or contention delays to perform coherence. Additional storage cost is also minimal. This paper presents an algorithm that is ideal in the sense that no local strategy could ever achieve a higher hit rate, for a given level of compiler analysis. PFC Enhancements (Parallel Fortran Converter)Loop Fusion in PFC (Supercomputer Software Newsletter #17, CRPC)This describes a code enhancement in PFC parallel code generator to fuse loops where logically possible, consistent with dependence patterns. This saves on synchronization overhead, exposes additional optimization possibilities, and improves cache utilization. Sum Reduction in PFC (Supercomputer Software Newsletter #13, CRPC)This describes an enhancement to the PFC vector code generator that expands the class of statements which can be recognized as special reduction idioms and what dependence patterns indicate those reductions. Other PapersAn Empirical Exploration of the Poincaré Model for Hyperbolic Geometry (Winter '93 Journal of Mathematics & Computer Education)by Joel Castellanos, Joe Dan Austin, and Ervan DarnellThis paper describes the implementation and application to teaching of a program that allows the user to draw objects in a hyperbolic space and then ask questions about geometric properties. Many common Euclidean theorems are still true in surprising ways. Others are not applicable and thus the user is challenged to understand the theorems at a deeper level and appreciate proper notions of proof. Non-EuclidExamples, software, and additional information on above paper. Home |